This notebook is a template with each step that you need to complete for the project.
Please fill in your code where there are explicit ? markers in the notebook. You are welcome to add more cells and code as you see fit.
Once you have completed all the code implementations, please export your notebook as a HTML file so the reviews can view your code. Make sure you have all outputs correctly outputted.
File-> Export Notebook As... -> Export Notebook as HTML
There is a writeup to complete as well after all code implememtation is done. Please answer all questions and attach the necessary tables and charts. You can complete the writeup in either markdown or PDF.
Completing the code template and writeup template will cover all of the rubric points for this project.
The rubric contains "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. The stand out suggestions are optional. If you decide to pursue the "stand out suggestions", you can include the code in this notebook and also discuss the results in the writeup file.
Below is example of steps to get the API username and key. Each student will have their own username and key.
kaggle.json and use the username and key.
ml.t3.medium instance (2 vCPU + 4 GiB)Python 3 (MXNet 1.8 Python 3.7 CPU Optimized)!pip install -U pip
!pip install -U setuptools wheel
!pip install -U "mxnet<2.0.0" bokeh==2.0.1
!pip install autogluon --no-cache-dir
# Without --no-cache-dir, smaller aws instances may have trouble installing
Requirement already satisfied: pip in /usr/local/lib/python3.8/dist-packages (21.3.1)
Collecting pip
Using cached pip-22.3.1-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 21.3.1
Uninstalling pip-21.3.1:
Successfully uninstalled pip-21.3.1
Successfully installed pip-22.3.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (59.3.0)
Collecting setuptools
Using cached setuptools-66.1.1-py3-none-any.whl (1.3 MB)
Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (0.34.2)
Collecting wheel
Using cached wheel-0.38.4-py3-none-any.whl (36 kB)
Installing collected packages: wheel, setuptools
Attempting uninstall: wheel
Found existing installation: wheel 0.34.2
Uninstalling wheel-0.34.2:
Successfully uninstalled wheel-0.34.2
Attempting uninstall: setuptools
Found existing installation: setuptools 59.3.0
Uninstalling setuptools-59.3.0:
Successfully uninstalled setuptools-59.3.0
Successfully installed setuptools-66.1.1 wheel-0.38.4
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting mxnet<2.0.0
Using cached mxnet-1.9.1-py3-none-manylinux2014_x86_64.whl (49.1 MB)
Collecting bokeh==2.0.1
Using cached bokeh-2.0.1-py3-none-any.whl
Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (4.0.1)
Requirement already satisfied: PyYAML>=3.10 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (5.4.1)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (2.8.2)
Requirement already satisfied: packaging>=16.8 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (21.3)
Requirement already satisfied: numpy>=1.11.3 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (1.19.1)
Requirement already satisfied: tornado>=5 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (6.1)
Requirement already satisfied: pillow>=4.0 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (9.0.0)
Requirement already satisfied: Jinja2>=2.7 in /usr/local/lib/python3.8/dist-packages (from bokeh==2.0.1) (3.0.3)
Requirement already satisfied: graphviz<0.9.0,>=0.8.1 in /usr/local/lib/python3.8/dist-packages (from mxnet<2.0.0) (0.8.4)
Requirement already satisfied: requests<3,>=2.20.0 in /usr/local/lib/python3.8/dist-packages (from mxnet<2.0.0) (2.27.1)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from Jinja2>=2.7->bokeh==2.0.1) (2.0.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging>=16.8->bokeh==2.0.1) (3.0.7)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.1->bokeh==2.0.1) (1.16.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (1.26.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (2021.10.8)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (2.0.10)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (3.3)
Installing collected packages: mxnet, bokeh
Attempting uninstall: bokeh
Found existing installation: bokeh 2.4.2
Uninstalling bokeh-2.4.2:
Successfully uninstalled bokeh-2.4.2
Successfully installed bokeh-2.0.1 mxnet-1.9.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting autogluon
Downloading autogluon-0.6.2-py3-none-any.whl (9.8 kB)
Collecting autogluon.timeseries[all]==0.6.2
Downloading autogluon.timeseries-0.6.2-py3-none-any.whl (103 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.6/103.6 kB 230.5 MB/s eta 0:00:00
Collecting autogluon.features==0.6.2
Downloading autogluon.features-0.6.2-py3-none-any.whl (60 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.0/60.0 kB 230.6 MB/s eta 0:00:00
Collecting autogluon.multimodal==0.6.2
Downloading autogluon.multimodal-0.6.2-py3-none-any.whl (303 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 303.4/303.4 kB 246.6 MB/s eta 0:00:00
Collecting autogluon.text==0.6.2
Downloading autogluon.text-0.6.2-py3-none-any.whl (62 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.1/62.1 kB 206.2 MB/s eta 0:00:00
Collecting autogluon.tabular[all]==0.6.2
Downloading autogluon.tabular-0.6.2-py3-none-any.whl (292 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.5/292.5 kB 327.1 MB/s eta 0:00:00
Collecting autogluon.core[all]==0.6.2
Downloading autogluon.core-0.6.2-py3-none-any.whl (226 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 226.5/226.5 kB 322.9 MB/s eta 0:00:00
Collecting autogluon.vision==0.6.2
Downloading autogluon.vision-0.6.2-py3-none-any.whl (49 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.8/49.8 kB 202.3 MB/s eta 0:00:00
Requirement already satisfied: tqdm>=4.38.0 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (4.39.0)
Requirement already satisfied: scipy<1.10.0,>=1.5.4 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (1.7.0)
Collecting numpy<1.24,>=1.21
Downloading numpy-1.23.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 264.2 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: psutil<6,>=5.7.3 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (5.9.0)
Collecting autogluon.common==0.6.2
Downloading autogluon.common-0.6.2-py3-none-any.whl (44 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 44.7/44.7 kB 191.4 MB/s eta 0:00:00
Collecting dask<=2021.11.2,>=2021.09.1
Downloading dask-2021.11.2-py3-none-any.whl (1.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 345.8 MB/s eta 0:00:00
Collecting networkx<3.0,>=2.3
Downloading networkx-2.8.8-py3-none-any.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 216.1 MB/s eta 0:00:00
Requirement already satisfied: pandas!=1.4.0,<1.6,>=1.2.5 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (1.3.0)
Collecting distributed<=2021.11.2,>=2021.09.1
Downloading distributed-2021.11.2-py3-none-any.whl (802 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 802.2/802.2 kB 239.4 MB/s eta 0:00:00
Requirement already satisfied: boto3 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (1.20.42)
Requirement already satisfied: scikit-learn<1.2,>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (1.0.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (3.5.1)
Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from autogluon.core[all]==0.6.2->autogluon) (2.27.1)
Collecting hyperopt<0.2.8,>=0.2.7
Downloading hyperopt-0.2.7-py2.py3-none-any.whl (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 118.5 MB/s eta 0:00:00
Collecting ray<2.1,>=2.0
Downloading ray-2.0.1-cp38-cp38-manylinux2014_x86_64.whl (60.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.2/60.2 MB 172.0 MB/s eta 0:00:0000:0100:01
Collecting torchmetrics<0.9.0,>=0.8.0
Downloading torchmetrics-0.8.2-py3-none-any.whl (409 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 409.8/409.8 kB 308.1 MB/s eta 0:00:00
Collecting jsonschema<=4.8.0
Downloading jsonschema-4.8.0-py3-none-any.whl (81 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.4/81.4 kB 239.8 MB/s eta 0:00:00
Collecting timm<0.7.0
Downloading timm-0.6.12-py3-none-any.whl (549 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 549.1/549.1 kB 340.3 MB/s eta 0:00:00
Collecting openmim<=0.2.1,>0.1.5
Downloading openmim-0.2.1-py2.py3-none-any.whl (49 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.7/49.7 kB 177.1 MB/s eta 0:00:00
Collecting accelerate<0.14,>=0.9
Downloading accelerate-0.13.2-py3-none-any.whl (148 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 148.8/148.8 kB 297.4 MB/s eta 0:00:00
Collecting Pillow<=9.4.0,>=9.3.0
Downloading Pillow-9.4.0-cp38-cp38-manylinux_2_28_x86_64.whl (3.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 107.3 MB/s eta 0:00:00
Collecting seqeval<=1.2.2
Downloading seqeval-1.2.2.tar.gz (43 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.6/43.6 kB 185.5 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting pytorch-lightning<1.8.0,>=1.7.4
Downloading pytorch_lightning-1.7.7-py3-none-any.whl (708 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 708.1/708.1 kB 308.8 MB/s eta 0:00:00
Collecting fairscale<=0.4.6,>=0.4.5
Downloading fairscale-0.4.6.tar.gz (248 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 248.2/248.2 kB 324.1 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting evaluate<=0.3.0
Downloading evaluate-0.3.0-py3-none-any.whl (72 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72.9/72.9 kB 236.6 MB/s eta 0:00:00
Collecting scikit-image<0.20.0,>=0.19.1
Downloading scikit_image-0.19.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.0/14.0 MB 222.0 MB/s eta 0:00:00a 0:00:01
Collecting transformers<4.24.0,>=4.23.0
Downloading transformers-4.23.1-py3-none-any.whl (5.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.3/5.3 MB 246.1 MB/s eta 0:00:00
Collecting torchtext<0.14.0
Downloading torchtext-0.13.1-cp38-cp38-manylinux1_x86_64.whl (1.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 311.2 MB/s eta 0:00:00
Collecting nptyping<1.5.0,>=1.4.4
Downloading nptyping-1.4.4-py3-none-any.whl (31 kB)
Collecting text-unidecode<=1.3
Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.2/78.2 kB 248.5 MB/s eta 0:00:00
Collecting pytorch-metric-learning<1.4.0,>=1.3.0
Downloading pytorch_metric_learning-1.3.2-py3-none-any.whl (109 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.4/109.4 kB 285.9 MB/s eta 0:00:00
Collecting albumentations<=1.2.0,>=1.1.0
Downloading albumentations-1.2.0-py3-none-any.whl (113 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 113.5/113.5 kB 9.9 MB/s eta 0:00:00
Collecting torch<1.13,>=1.9
Downloading torch-1.12.1-cp38-cp38-manylinux1_x86_64.whl (776.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 776.3/776.3 MB 209.7 MB/s eta 0:00:0000:0100:01
Collecting torchvision<0.14.0
Downloading torchvision-0.13.1-cp38-cp38-manylinux1_x86_64.whl (19.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.1/19.1 MB 217.5 MB/s eta 0:00:00a 0:00:01
Collecting nlpaug<=1.1.10,>=1.1.10
Downloading nlpaug-1.1.10-py3-none-any.whl (410 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.8/410.8 kB 326.5 MB/s eta 0:00:00
Collecting smart-open<5.3.0,>=5.2.1
Downloading smart_open-5.2.1-py3-none-any.whl (58 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.6/58.6 kB 219.2 MB/s eta 0:00:00
Collecting sentencepiece<0.2.0,>=0.1.95
Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 353.5 MB/s eta 0:00:00
Collecting omegaconf<2.2.0,>=2.1.1
Downloading omegaconf-2.1.2-py3-none-any.whl (74 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74.7/74.7 kB 249.7 MB/s eta 0:00:00
Collecting defusedxml<=0.7.1,>=0.7.1
Downloading defusedxml-0.7.1-py2.py3-none-any.whl (25 kB)
Collecting nltk<4.0.0,>=3.4.5
Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 356.7 MB/s eta 0:00:00
Collecting fastai<2.8,>=2.3.1
Downloading fastai-2.7.10-py3-none-any.whl (240 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 240.9/240.9 kB 315.8 MB/s eta 0:00:00
Collecting xgboost<1.8,>=1.6
Downloading xgboost-1.7.3-py3-none-manylinux2014_x86_64.whl (193.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 193.6/193.6 MB 239.4 MB/s eta 0:00:00a 0:00:01
Collecting catboost<1.2,>=1.0
Downloading catboost-1.1.1-cp38-none-manylinux1_x86_64.whl (76.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.6/76.6 MB 237.9 MB/s eta 0:00:00a 0:00:01
Collecting lightgbm<3.4,>=3.3
Downloading lightgbm-3.3.4-py3-none-manylinux1_x86_64.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 338.7 MB/s eta 0:00:00
Collecting statsmodels~=0.13.0
Downloading statsmodels-0.13.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.9/9.9 MB 213.8 MB/s eta 0:00:00a 0:00:01
Collecting gluonts~=0.11.0
Downloading gluonts-0.11.8-py3-none-any.whl (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 258.8 MB/s eta 0:00:00
Requirement already satisfied: joblib~=1.1 in /usr/local/lib/python3.8/dist-packages (from autogluon.timeseries[all]==0.6.2->autogluon) (1.1.0)
Collecting tbats~=1.1
Downloading tbats-1.1.2-py3-none-any.whl (43 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.8/43.8 kB 188.6 MB/s eta 0:00:00
Collecting pmdarima~=1.8.2
Downloading pmdarima-1.8.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 350.1 MB/s eta 0:00:00
Collecting sktime<0.14,>=0.13.1
Downloading sktime-0.13.4-py3-none-any.whl (7.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 96.3 MB/s eta 0:00:00a 0:00:01
Collecting gluoncv<0.10.6,>=0.10.5
Downloading gluoncv-0.10.5.post0-py2.py3-none-any.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 347.2 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from autogluon.common==0.6.2->autogluon.core[all]==0.6.2->autogluon) (66.1.1)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.8/dist-packages (from accelerate<0.14,>=0.9->autogluon.multimodal==0.6.2->autogluon) (21.3)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.8/dist-packages (from accelerate<0.14,>=0.9->autogluon.multimodal==0.6.2->autogluon) (5.4.1)
Collecting opencv-python-headless>=4.1.1
Downloading opencv_python_headless-4.7.0.68-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.2/49.2 MB 184.7 MB/s eta 0:00:00a 0:00:01
Collecting qudida>=0.0.4
Downloading qudida-0.0.4-py3-none-any.whl (3.5 kB)
Collecting albumentations<=1.2.0,>=1.1.0
Downloading albumentations-1.1.0-py3-none-any.whl (102 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102.4/102.4 kB 265.2 MB/s eta 0:00:00
Requirement already satisfied: plotly in /usr/local/lib/python3.8/dist-packages (from catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.2->autogluon) (5.5.0)
Requirement already satisfied: graphviz in /usr/local/lib/python3.8/dist-packages (from catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.2->autogluon) (0.8.4)
Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.2->autogluon) (1.16.0)
Collecting partd>=0.3.10
Downloading partd-1.3.0-py3-none-any.whl (18 kB)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.2->autogluon) (2022.1.0)
Collecting toolz>=0.8.2
Downloading toolz-0.12.0-py3-none-any.whl (55 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 55.8/55.8 kB 214.2 MB/s eta 0:00:00
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.2->autogluon) (2.0.0)
Collecting click>=6.6
Downloading click-8.1.3-py3-none-any.whl (96 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.6/96.6 kB 181.1 MB/s eta 0:00:00
Collecting sortedcontainers!=2.0.0,!=2.0.1
Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Collecting msgpack>=0.6.0
Downloading msgpack-1.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (322 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 322.5/322.5 kB 308.7 MB/s eta 0:00:00
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.2->autogluon) (6.1)
Collecting zict>=0.1.3
Downloading zict-2.2.0-py2.py3-none-any.whl (23 kB)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.2->autogluon) (3.0.3)
Collecting tblib>=1.6.0
Downloading tblib-1.7.0-py2.py3-none-any.whl (12 kB)
Collecting responses<0.19
Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Requirement already satisfied: multiprocess in /usr/local/lib/python3.8/dist-packages (from evaluate<=0.3.0->autogluon.multimodal==0.6.2->autogluon) (0.70.12.2)
Requirement already satisfied: dill in /usr/local/lib/python3.8/dist-packages (from evaluate<=0.3.0->autogluon.multimodal==0.6.2->autogluon) (0.3.4)
Collecting tqdm>=4.38.0
Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 223.8 MB/s eta 0:00:00
Collecting xxhash
Downloading xxhash-3.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (213 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 213.0/213.0 kB 309.6 MB/s eta 0:00:00
Collecting datasets>=2.0.0
Downloading datasets-2.8.0-py3-none-any.whl (452 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 452.9/452.9 kB 342.9 MB/s eta 0:00:00
Collecting huggingface-hub>=0.7.0
Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 182.4/182.4 kB 309.1 MB/s eta 0:00:00
Collecting fastprogress>=0.2.4
Downloading fastprogress-1.0.3-py3-none-any.whl (12 kB)
Collecting fastcore<1.6,>=1.4.5
Downloading fastcore-1.5.27-py3-none-any.whl (67 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.1/67.1 kB 234.8 MB/s eta 0:00:00
Collecting fastdownload<2,>=0.0.5
Downloading fastdownload-0.0.7-py3-none-any.whl (12 kB)
Requirement already satisfied: pip in /usr/local/lib/python3.8/dist-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.6.2->autogluon) (22.3.1)
Collecting spacy<4
Downloading spacy-3.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.7/6.7 MB 251.0 MB/s eta 0:00:00
Collecting autocfg
Downloading autocfg-0.0.8-py3-none-any.whl (13 kB)
Collecting yacs
Downloading yacs-0.1.8-py3-none-any.whl (14 kB)
Requirement already satisfied: portalocker in /usr/local/lib/python3.8/dist-packages (from gluoncv<0.10.6,>=0.10.5->autogluon.vision==0.6.2->autogluon) (2.3.2)
Requirement already satisfied: opencv-python in /usr/local/lib/python3.8/dist-packages (from gluoncv<0.10.6,>=0.10.5->autogluon.vision==0.6.2->autogluon) (4.5.5.62)
Collecting pydantic~=1.7
Downloading pydantic-1.10.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.2/3.2 MB 309.0 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.8/dist-packages (from gluonts~=0.11.0->autogluon.timeseries[all]==0.6.2->autogluon) (4.0.1)
Collecting future
Downloading future-0.18.3.tar.gz (840 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 840.9/840.9 kB 361.1 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting py4j
Downloading py4j-0.10.9.7-py2.py3-none-any.whl (200 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.5/200.5 kB 176.3 MB/s eta 0:00:00
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema<=4.8.0->autogluon.multimodal==0.6.2->autogluon) (21.4.0)
Collecting importlib-resources>=1.4.0
Downloading importlib_resources-5.10.2-py3-none-any.whl (34 kB)
Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0
Downloading pyrsistent-0.19.3-py3-none-any.whl (57 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.5/57.5 kB 197.7 MB/s eta 0:00:00
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (from lightgbm<3.4,>=3.3->autogluon.tabular[all]==0.6.2->autogluon) (0.38.4)
Collecting regex>=2021.8.3
Downloading regex-2022.10.31-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (772 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 772.3/772.3 kB 360.3 MB/s eta 0:00:00
Collecting typish>=1.7.0
Downloading typish-1.9.3-py3-none-any.whl (45 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.1/45.1 kB 175.1 MB/s eta 0:00:00
Collecting antlr4-python3-runtime==4.8
Downloading antlr4-python3-runtime-4.8.tar.gz (112 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 112.4/112.4 kB 295.9 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting rich
Downloading rich-13.2.0-py3-none-any.whl (238 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 238.9/238.9 kB 327.9 MB/s eta 0:00:00
Requirement already satisfied: tabulate in /usr/local/lib/python3.8/dist-packages (from openmim<=0.2.1,>0.1.5->autogluon.multimodal==0.6.2->autogluon) (0.8.9)
Collecting model-index
Downloading model_index-0.1.11-py3-none-any.whl (34 kB)
Requirement already satisfied: colorama in /usr/local/lib/python3.8/dist-packages (from openmim<=0.2.1,>0.1.5->autogluon.multimodal==0.6.2->autogluon) (0.4.3)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas!=1.4.0,<1.6,>=1.2.5->autogluon.core[all]==0.6.2->autogluon) (2021.3)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas!=1.4.0,<1.6,>=1.2.5->autogluon.core[all]==0.6.2->autogluon) (2.8.2)
Requirement already satisfied: Cython!=0.29.18,>=0.29 in /usr/local/lib/python3.8/dist-packages (from pmdarima~=1.8.2->autogluon.timeseries[all]==0.6.2->autogluon) (0.29.26)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.8/dist-packages (from pmdarima~=1.8.2->autogluon.timeseries[all]==0.6.2->autogluon) (1.26.8)
Collecting pyDeprecate>=0.3.1
Downloading pyDeprecate-0.3.2-py3-none-any.whl (10 kB)
Collecting tensorboard>=2.9.1
Downloading tensorboard-2.11.2-py3-none-any.whl (6.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.0/6.0 MB 241.3 MB/s eta 0:00:00
Requirement already satisfied: protobuf<4.0.0,>=3.15.3 in /usr/local/lib/python3.8/dist-packages (from ray<2.1,>=2.0->autogluon.core[all]==0.6.2->autogluon) (3.19.3)
Collecting filelock
Downloading filelock-3.9.0-py3-none-any.whl (9.7 kB)
Collecting grpcio<=1.43.0,>=1.32.0
Downloading grpcio-1.43.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.1/4.1 MB 260.6 MB/s eta 0:00:00
Collecting frozenlist
Downloading frozenlist-1.3.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (161 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 161.3/161.3 kB 290.5 MB/s eta 0:00:00
Collecting aiosignal
Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)
Collecting virtualenv
Downloading virtualenv-20.17.1-py3-none-any.whl (8.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.8/8.8 MB 216.3 MB/s eta 0:00:00a 0:00:01
Collecting click>=6.6
Downloading click-8.0.4-py3-none-any.whl (97 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.5/97.5 kB 261.3 MB/s eta 0:00:00
Collecting tensorboardX>=1.9
Downloading tensorboardX-2.5.1-py2.py3-none-any.whl (125 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.4/125.4 kB 255.9 MB/s eta 0:00:00
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->autogluon.core[all]==0.6.2->autogluon) (3.3)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.8/dist-packages (from requests->autogluon.core[all]==0.6.2->autogluon) (2.0.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests->autogluon.core[all]==0.6.2->autogluon) (2021.10.8)
Collecting PyWavelets>=1.1.1
Downloading PyWavelets-1.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.9/6.9 MB 263.5 MB/s eta 0:00:00
Collecting tifffile>=2019.7.26
Downloading tifffile-2023.1.23.1-py3-none-any.whl (214 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 214.8/214.8 kB 286.4 MB/s eta 0:00:00
Requirement already satisfied: imageio>=2.4.1 in /usr/local/lib/python3.8/dist-packages (from scikit-image<0.20.0,>=0.19.1->autogluon.multimodal==0.6.2->autogluon) (2.14.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn<1.2,>=1.0.0->autogluon.core[all]==0.6.2->autogluon) (3.0.0)
Collecting numpy<1.24,>=1.21
Downloading numpy-1.22.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.9/16.9 MB 172.6 MB/s eta 0:00:00a 0:00:01
Collecting deprecated>=1.2.13
Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: numba>=0.53 in /usr/local/lib/python3.8/dist-packages (from sktime<0.14,>=0.13.1->autogluon.timeseries[all]==0.6.2->autogluon) (0.55.0)
Collecting patsy>=0.5.2
Downloading patsy-0.5.3-py2.py3-none-any.whl (233 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 233.8/233.8 kB 290.4 MB/s eta 0:00:00
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.6/7.6 MB 274.3 MB/s eta 0:00:00
Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /usr/local/lib/python3.8/dist-packages (from boto3->autogluon.core[all]==0.6.2->autogluon) (0.5.0)
Requirement already satisfied: botocore<1.24.0,>=1.23.42 in /usr/local/lib/python3.8/dist-packages (from boto3->autogluon.core[all]==0.6.2->autogluon) (1.23.42)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.8/dist-packages (from boto3->autogluon.core[all]==0.6.2->autogluon) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->autogluon.core[all]==0.6.2->autogluon) (1.3.2)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.8/dist-packages (from matplotlib->autogluon.core[all]==0.6.2->autogluon) (4.29.0)
Requirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->autogluon.core[all]==0.6.2->autogluon) (3.0.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib->autogluon.core[all]==0.6.2->autogluon) (0.11.0)
Collecting aiohttp
Downloading aiohttp-3.8.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 348.0 MB/s eta 0:00:00
Requirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.8/dist-packages (from datasets>=2.0.0->evaluate<=0.3.0->autogluon.multimodal==0.6.2->autogluon) (6.0.1)
Collecting wrapt<2,>=1.10
Downloading wrapt-1.14.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (81 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.0/81.0 kB 218.8 MB/s eta 0:00:00
Requirement already satisfied: zipp>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0->jsonschema<=4.8.0->autogluon.multimodal==0.6.2->autogluon) (3.7.0)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.53->sktime<0.14,>=0.13.1->autogluon.timeseries[all]==0.6.2->autogluon) (0.38.0)
Collecting numpy<1.24,>=1.21
Downloading numpy-1.21.6-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.7/15.7 MB 176.4 MB/s eta 0:00:00a 0:00:01
Collecting locket
Downloading locket-1.0.0-py2.py3-none-any.whl (4.4 kB)
Collecting typing-extensions~=4.0
Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Collecting thinc<8.2.0,>=8.1.0
Downloading thinc-8.1.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (828 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 828.9/828.9 kB 350.7 MB/s eta 0:00:00
Collecting srsly<3.0.0,>=2.4.3
Downloading srsly-2.4.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (492 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 492.6/492.6 kB 327.6 MB/s eta 0:00:00
Collecting langcodes<4.0.0,>=3.2.0
Downloading langcodes-3.3.0-py3-none-any.whl (181 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 181.6/181.6 kB 271.5 MB/s eta 0:00:00
Collecting spacy-loggers<2.0.0,>=1.0.0
Downloading spacy_loggers-1.0.4-py3-none-any.whl (11 kB)
Collecting cymem<2.1.0,>=2.0.2
Downloading cymem-2.0.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36 kB)
Collecting typer<0.8.0,>=0.3.0
Downloading typer-0.7.0-py3-none-any.whl (38 kB)
Collecting murmurhash<1.1.0,>=0.28.0
Downloading murmurhash-1.0.9-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21 kB)
Collecting preshed<3.1.0,>=3.0.2
Downloading preshed-3.0.8-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (130 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 130.8/130.8 kB 294.1 MB/s eta 0:00:00
Collecting spacy-legacy<3.1.0,>=3.0.11
Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl (29 kB)
Collecting pathy>=0.10.0
Downloading pathy-0.10.1-py3-none-any.whl (48 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.9/48.9 kB 231.6 MB/s eta 0:00:00
Collecting wasabi<1.2.0,>=0.9.1
Downloading wasabi-1.1.1-py3-none-any.whl (27 kB)
Collecting catalogue<2.1.0,>=2.0.6
Downloading catalogue-2.0.8-py3-none-any.whl (17 kB)
Collecting tensorboard-data-server<0.7.0,>=0.6.0
Downloading tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 274.6 MB/s eta 0:00:00
Collecting markdown>=2.6.8
Downloading Markdown-3.4.1-py3-none-any.whl (93 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 93.3/93.3 kB 250.5 MB/s eta 0:00:00
Collecting google-auth<3,>=1.6.3
Downloading google_auth-2.16.0-py2.py3-none-any.whl (177 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.8/177.8 kB 303.7 MB/s eta 0:00:00
Collecting absl-py>=0.4
Downloading absl_py-1.4.0-py3-none-any.whl (126 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.5/126.5 kB 287.5 MB/s eta 0:00:00
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from tensorboard>=2.9.1->pytorch-lightning<1.8.0,>=1.7.4->autogluon.multimodal==0.6.2->autogluon) (2.0.2)
Collecting google-auth-oauthlib<0.5,>=0.4.1
Downloading google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Collecting tensorboard-plugin-wit>=1.6.0
Downloading tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 781.3/781.3 kB 357.0 MB/s eta 0:00:00
Collecting heapdict
Downloading HeapDict-1.0.1-py3-none-any.whl (3.9 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.6.2->autogluon) (2.0.1)
Collecting ordered-set
Downloading ordered_set-4.1.0-py3-none-any.whl (7.6 kB)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.8/dist-packages (from plotly->catboost<1.2,>=1.0->autogluon.tabular[all]==0.6.2->autogluon) (8.0.1)
Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /usr/local/lib/python3.8/dist-packages (from rich->openmim<=0.2.1,>0.1.5->autogluon.multimodal==0.6.2->autogluon) (2.14.0)
Collecting markdown-it-py<3.0.0,>=2.1.0
Downloading markdown_it_py-2.1.0-py3-none-any.whl (84 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84.5/84.5 kB 245.6 MB/s eta 0:00:00
Collecting platformdirs<3,>=2.4
Downloading platformdirs-2.6.2-py3-none-any.whl (14 kB)
Collecting distlib<1,>=0.3.6
Downloading distlib-0.3.6-py2.py3-none-any.whl (468 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.5/468.5 kB 351.2 MB/s eta 0:00:00
Collecting pyasn1-modules>=0.2.1
Downloading pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 kB 192.3 MB/s eta 0:00:00
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.8/dist-packages (from google-auth<3,>=1.6.3->tensorboard>=2.9.1->pytorch-lightning<1.8.0,>=1.7.4->autogluon.multimodal==0.6.2->autogluon) (4.7.2)
Collecting cachetools<6.0,>=2.0.0
Downloading cachetools-5.3.0-py3-none-any.whl (9.3 kB)
Collecting requests-oauthlib>=0.7.0
Downloading requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.8/dist-packages (from markdown>=2.6.8->tensorboard>=2.9.1->pytorch-lightning<1.8.0,>=1.7.4->autogluon.multimodal==0.6.2->autogluon) (4.10.1)
Collecting mdurl~=0.1
Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Collecting confection<1.0.0,>=0.0.1
Downloading confection-0.0.4-py3-none-any.whl (32 kB)
Collecting blis<0.8.0,>=0.7.8
Downloading blis-0.7.9-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 228.5 MB/s eta 0:00:00a 0:00:01
Collecting yarl<2.0,>=1.0
Downloading yarl-1.8.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (262 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 262.1/262.1 kB 307.4 MB/s eta 0:00:00
Collecting async-timeout<5.0,>=4.0.0a3
Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting multidict<7.0,>=4.5
Downloading multidict-6.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (121 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.3/121.3 kB 275.5 MB/s eta 0:00:00
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.8/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard>=2.9.1->pytorch-lightning<1.8.0,>=1.7.4->autogluon.multimodal==0.6.2->autogluon) (0.4.8)
Collecting oauthlib>=3.0.0
Downloading oauthlib-3.2.2-py3-none-any.whl (151 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 151.7/151.7 kB 306.9 MB/s eta 0:00:00
Building wheels for collected packages: fairscale, antlr4-python3-runtime, seqeval, future
Building wheel for fairscale (pyproject.toml) ... done
Created wheel for fairscale: filename=fairscale-0.4.6-py3-none-any.whl size=307224 sha256=1150205fdf93ac4671be1dcd864a94c69055e9f56f2e5ddc638e35293dbbeee9
Stored in directory: /tmp/pip-ephem-wheel-cache-m9dqe9h4/wheels/60/e8/f1/4f2cc869823c35e834c6cee0552a0605c2bdc89f7da81f1a1d
Building wheel for antlr4-python3-runtime (setup.py) ... done
Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.8-py3-none-any.whl size=141211 sha256=d53fafbd0f048981ae67cdbe3d132d57709f4ea36d6e0bd9124fdf9d1eceda37
Stored in directory: /tmp/pip-ephem-wheel-cache-m9dqe9h4/wheels/34/d7/fe/a833ceccaee881c6f8cd49985ee4285bf94c5cf2c66ea5db68
Building wheel for seqeval (setup.py) ... done
Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16164 sha256=5fb3a6f1ebf73bdc2f7679ae4b8ad4f630a56c56aafb9408ac63aae1b882e704
Stored in directory: /tmp/pip-ephem-wheel-cache-m9dqe9h4/wheels/e3/30/9b/6b670dac34775f2b7cc4e9b172202e81fbb4f9cdb103c1ca66
Building wheel for future (setup.py) ... done
Created wheel for future: filename=future-0.18.3-py3-none-any.whl size=492025 sha256=77a91421a9d28e092163c947558bb81a1b74e973ebf164a9d4765a922858f5c0
Stored in directory: /tmp/pip-ephem-wheel-cache-m9dqe9h4/wheels/a6/db/41/71a0e5d071a14e716cc11bb021a9caa8f76ec337eca071487e
Successfully built fairscale antlr4-python3-runtime seqeval future
Installing collected packages: typish, tokenizers, text-unidecode, tensorboard-plugin-wit, sortedcontainers, sentencepiece, py4j, msgpack, heapdict, distlib, cymem, antlr4-python3-runtime, zict, yacs, xxhash, wrapt, wasabi, typing-extensions, tqdm, toolz, tensorboard-data-server, tblib, spacy-loggers, spacy-legacy, smart-open, regex, pyrsistent, pyDeprecate, pyasn1-modules, platformdirs, Pillow, ordered-set, omegaconf, oauthlib, numpy, networkx, murmurhash, multidict, mdurl, locket, langcodes, importlib-resources, grpcio, future, frozenlist, filelock, fastprogress, defusedxml, click, catalogue, cachetools, autocfg, async-timeout, absl-py, yarl, virtualenv, typer, torch, tifffile, tensorboardX, srsly, responses, requests-oauthlib, PyWavelets, pydantic, preshed, patsy, partd, opencv-python-headless, nptyping, nltk, markdown-it-py, markdown, jsonschema, huggingface-hub, google-auth, fastcore, deprecated, blis, aiosignal, xgboost, transformers, torchvision, torchtext, torchmetrics, statsmodels, scikit-image, rich, ray, pathy, nlpaug, model-index, hyperopt, google-auth-oauthlib, gluonts, gluoncv, fastdownload, fairscale, dask, confection, catboost, aiohttp, accelerate, timm, thinc, tensorboard, sktime, seqeval, qudida, pytorch-metric-learning, pmdarima, openmim, lightgbm, distributed, tbats, spacy, pytorch-lightning, datasets, autogluon.common, albumentations, fastai, evaluate, autogluon.features, autogluon.core, autogluon.tabular, autogluon.multimodal, autogluon.vision, autogluon.timeseries, autogluon.text, autogluon
Attempting uninstall: typing-extensions
Found existing installation: typing_extensions 4.0.1
Uninstalling typing_extensions-4.0.1:
Successfully uninstalled typing_extensions-4.0.1
Attempting uninstall: tqdm
Found existing installation: tqdm 4.39.0
Uninstalling tqdm-4.39.0:
Successfully uninstalled tqdm-4.39.0
Attempting uninstall: Pillow
Found existing installation: Pillow 9.0.0
Uninstalling Pillow-9.0.0:
Successfully uninstalled Pillow-9.0.0
Attempting uninstall: numpy
Found existing installation: numpy 1.19.1
Uninstalling numpy-1.19.1:
Successfully uninstalled numpy-1.19.1
Attempting uninstall: gluoncv
Found existing installation: gluoncv 0.8.0
Uninstalling gluoncv-0.8.0:
Successfully uninstalled gluoncv-0.8.0
Successfully installed Pillow-9.4.0 PyWavelets-1.4.1 absl-py-1.4.0 accelerate-0.13.2 aiohttp-3.8.3 aiosignal-1.3.1 albumentations-1.1.0 antlr4-python3-runtime-4.8 async-timeout-4.0.2 autocfg-0.0.8 autogluon-0.6.2 autogluon.common-0.6.2 autogluon.core-0.6.2 autogluon.features-0.6.2 autogluon.multimodal-0.6.2 autogluon.tabular-0.6.2 autogluon.text-0.6.2 autogluon.timeseries-0.6.2 autogluon.vision-0.6.2 blis-0.7.9 cachetools-5.3.0 catalogue-2.0.8 catboost-1.1.1 click-8.0.4 confection-0.0.4 cymem-2.0.7 dask-2021.11.2 datasets-2.8.0 defusedxml-0.7.1 deprecated-1.2.13 distlib-0.3.6 distributed-2021.11.2 evaluate-0.3.0 fairscale-0.4.6 fastai-2.7.10 fastcore-1.5.27 fastdownload-0.0.7 fastprogress-1.0.3 filelock-3.9.0 frozenlist-1.3.3 future-0.18.3 gluoncv-0.10.5.post0 gluonts-0.11.8 google-auth-2.16.0 google-auth-oauthlib-0.4.6 grpcio-1.43.0 heapdict-1.0.1 huggingface-hub-0.11.1 hyperopt-0.2.7 importlib-resources-5.10.2 jsonschema-4.8.0 langcodes-3.3.0 lightgbm-3.3.4 locket-1.0.0 markdown-3.4.1 markdown-it-py-2.1.0 mdurl-0.1.2 model-index-0.1.11 msgpack-1.0.4 multidict-6.0.4 murmurhash-1.0.9 networkx-2.8.8 nlpaug-1.1.10 nltk-3.8.1 nptyping-1.4.4 numpy-1.21.6 oauthlib-3.2.2 omegaconf-2.1.2 opencv-python-headless-4.7.0.68 openmim-0.2.1 ordered-set-4.1.0 partd-1.3.0 pathy-0.10.1 patsy-0.5.3 platformdirs-2.6.2 pmdarima-1.8.5 preshed-3.0.8 py4j-0.10.9.7 pyDeprecate-0.3.2 pyasn1-modules-0.2.8 pydantic-1.10.4 pyrsistent-0.19.3 pytorch-lightning-1.7.7 pytorch-metric-learning-1.3.2 qudida-0.0.4 ray-2.0.1 regex-2022.10.31 requests-oauthlib-1.3.1 responses-0.18.0 rich-13.2.0 scikit-image-0.19.3 sentencepiece-0.1.97 seqeval-1.2.2 sktime-0.13.4 smart-open-5.2.1 sortedcontainers-2.4.0 spacy-3.5.0 spacy-legacy-3.0.12 spacy-loggers-1.0.4 srsly-2.4.5 statsmodels-0.13.5 tbats-1.1.2 tblib-1.7.0 tensorboard-2.11.2 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorboardX-2.5.1 text-unidecode-1.3 thinc-8.1.7 tifffile-2023.1.23.1 timm-0.6.12 tokenizers-0.13.2 toolz-0.12.0 torch-1.12.1 torchmetrics-0.8.2 torchtext-0.13.1 torchvision-0.13.1 tqdm-4.64.1 transformers-4.23.1 typer-0.7.0 typing-extensions-4.4.0 typish-1.9.3 virtualenv-20.17.1 wasabi-1.1.1 wrapt-1.14.1 xgboost-1.7.3 xxhash-3.2.0 yacs-0.1.8 yarl-1.8.2 zict-2.2.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
cd /root
/root
ls
AutogluonModels/ submission.csv bike-sharing-demand.zip submission_new_features.csv cd0385-project-starter/ submission_new_features_2.csv histogram_hours_feature.png submission_new_hpo.csv model_test_score.png test.csv model_train_score.png train.csv sampleSubmission.csv
!pip install -U kaggle
Collecting kaggle Using cached kaggle-1.5.12-py3-none-any.whl Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.8/dist-packages (from kaggle) (1.16.0) Collecting python-slugify Using cached python_slugify-7.0.0-py2.py3-none-any.whl (9.4 kB) Requirement already satisfied: urllib3 in /usr/local/lib/python3.8/dist-packages (from kaggle) (1.26.8) Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from kaggle) (4.64.1) Requirement already satisfied: python-dateutil in /usr/local/lib/python3.8/dist-packages (from kaggle) (2.8.2) Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from kaggle) (2.27.1) Requirement already satisfied: certifi in /usr/local/lib/python3.8/dist-packages (from kaggle) (2021.10.8) Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.8/dist-packages (from python-slugify->kaggle) (1.3) Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.8/dist-packages (from requests->kaggle) (2.0.10) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->kaggle) (3.3) Installing collected packages: python-slugify, kaggle Successfully installed kaggle-1.5.12 python-slugify-7.0.0 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
# create the .kaggle directory and an empty kaggle.json file
!mkdir -p /root/.kaggle
!touch /root/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json
# Fill in your user name and key from creating the kaggle account and API token file
import json
kaggle_username = "pghugare"
kaggle_key = "6add27806572c978578c6e3757086862"
# Save API token the kaggle.json file
with open("/root/.kaggle/kaggle.json", "w") as f:
f.write(json.dumps({"username": kaggle_username, "key": kaggle_key}))
# Download the dataset, it will be in a .zip file so you'll need to unzip it as well.
!kaggle competitions download -c bike-sharing-demand
# If you already downloaded it you can use the -o command to overwrite the file
!unzip -o bike-sharing-demand.zip
Downloading bike-sharing-demand.zip to /root 0%| | 0.00/189k [00:00<?, ?B/s] 100%|████████████████████████████████████████| 189k/189k [00:00<00:00, 6.16MB/s] Archive: bike-sharing-demand.zip inflating: sampleSubmission.csv inflating: test.csv inflating: train.csv
import pandas as pd
from autogluon.tabular import TabularPredictor
/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
# Create the train dataset in pandas by reading the csv
# Set the parsing of the datetime column so you can use some of the `dt` features in pandas later
train = pd.read_csv('train.csv', parse_dates=['datetime'])
train.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | casual | registered | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-01 00:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 3 | 13 | 16 |
| 1 | 2011-01-01 01:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 8 | 32 | 40 |
| 2 | 2011-01-01 02:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 5 | 27 | 32 |
| 3 | 2011-01-01 03:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 3 | 10 | 13 |
| 4 | 2011-01-01 04:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 0 | 1 | 1 |
# Simple output of the train dataset to view some of the min/max/varition of the dataset features.
# Create the test pandas dataframe in pandas by reading the csv, remember to parse the datetime!
test = pd.read_csv('test.csv', parse_dates=['datetime'])
test.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-20 00:00:00 | 1 | 0 | 1 | 1 | 10.66 | 11.365 | 56 | 26.0027 |
| 1 | 2011-01-20 01:00:00 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 |
| 2 | 2011-01-20 02:00:00 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 |
| 3 | 2011-01-20 03:00:00 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 |
| 4 | 2011-01-20 04:00:00 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 |
# Same thing as train and test dataset
submission = pd.read_csv('sampleSubmission.csv', parse_dates=['datetime'])
submission.head()
| datetime | count | |
|---|---|---|
| 0 | 2011-01-20 00:00:00 | 0 |
| 1 | 2011-01-20 01:00:00 | 0 |
| 2 | 2011-01-20 02:00:00 | 0 |
| 3 | 2011-01-20 03:00:00 | 0 |
| 4 | 2011-01-20 04:00:00 | 0 |
Requirements:
count, so it is the label we are setting.casual and registered columns as they are also not present in the test dataset. root_mean_squared_error as the metric to use for evaluation.best_quality to focus on creating the best model.columns_to_ignore = ["casual", "registered"]
for column_name in columns_to_ignore:
train.drop(column_name, axis='columns', inplace=True)
train.head()
## performance metric (RMSE by default)
predictor = TabularPredictor(label="count").fit(train_data = train, time_limit=600, presets=['best_quality'])
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | count | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-01 00:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 16 |
| 1 | 2011-01-01 01:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 40 |
| 2 | 2011-01-01 02:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 32 |
| 3 | 2011-01-01 03:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 13 |
| 4 | 2011-01-01 04:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 1 |
predictor.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L3 -52.885174 11.174019 505.817829 0.000712 0.311318 3 True 15
1 RandomForestMSE_BAG_L2 -53.393270 10.297766 406.352121 0.606169 27.262657 2 True 12
2 ExtraTreesMSE_BAG_L2 -54.021078 10.297608 387.015604 0.606012 7.926141 2 True 14
3 LightGBM_BAG_L2 -55.101032 9.907717 400.566041 0.216120 21.476578 2 True 11
4 CatBoost_BAG_L2 -55.705478 9.745007 448.841135 0.053410 69.751672 2 True 13
5 LightGBMXT_BAG_L2 -60.705655 13.691409 432.604342 3.999813 53.514878 2 True 10
6 KNeighborsDist_BAG_L1 -84.125061 0.038620 0.029779 0.038620 0.029779 1 True 2
7 WeightedEnsemble_L2 -84.125061 0.039294 0.477109 0.000675 0.447330 2 True 9
8 KNeighborsUnif_BAG_L1 -101.546199 0.039820 0.032498 0.039820 0.032498 1 True 1
9 RandomForestMSE_BAG_L1 -116.544294 0.587877 10.327482 0.587877 10.327482 1 True 5
10 ExtraTreesMSE_BAG_L1 -124.588053 0.682145 4.782058 0.682145 4.782058 1 True 7
11 CatBoost_BAG_L1 -130.485847 0.124086 196.887096 0.124086 196.887096 1 True 6
12 LightGBM_BAG_L1 -131.054162 1.392111 26.093218 1.392111 26.093218 1 True 4
13 LightGBMXT_BAG_L1 -131.460909 6.493901 60.742787 6.493901 60.742787 1 True 3
14 NeuralNetFastAI_BAG_L1 -136.539545 0.333037 80.194546 0.333037 80.194546 1 True 8
Number of models trained: 15
Types of models trained:
{'StackerEnsembleModel_LGB', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_XT', 'WeightedEnsembleModel'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 3 | ['season', 'weather', 'humidity']
('int', ['bool']) : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20230123_035239/SummaryOfModels.html
*** End of fit() summary ***
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'KNeighborsUnif_BAG_L1': -101.54619908446061,
'KNeighborsDist_BAG_L1': -84.12506123181602,
'LightGBMXT_BAG_L1': -131.46090891834504,
'LightGBM_BAG_L1': -131.054161598899,
'RandomForestMSE_BAG_L1': -116.54429428704391,
'CatBoost_BAG_L1': -130.48584656124748,
'ExtraTreesMSE_BAG_L1': -124.58805258915959,
'NeuralNetFastAI_BAG_L1': -136.5395454996815,
'WeightedEnsemble_L2': -84.12506123181602,
'LightGBMXT_BAG_L2': -60.705655042275914,
'LightGBM_BAG_L2': -55.10103226835344,
'RandomForestMSE_BAG_L2': -53.39326979793196,
'CatBoost_BAG_L2': -55.705477592793756,
'ExtraTreesMSE_BAG_L2': -54.02107813705378,
'WeightedEnsemble_L3': -52.88517418301905},
'model_best': 'WeightedEnsemble_L3',
'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/KNeighborsUnif_BAG_L1/',
'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/KNeighborsDist_BAG_L1/',
'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/LightGBMXT_BAG_L1/',
'LightGBM_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/LightGBM_BAG_L1/',
'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/RandomForestMSE_BAG_L1/',
'CatBoost_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/CatBoost_BAG_L1/',
'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/ExtraTreesMSE_BAG_L1/',
'NeuralNetFastAI_BAG_L1': 'AutogluonModels/ag-20230123_035239/models/NeuralNetFastAI_BAG_L1/',
'WeightedEnsemble_L2': 'AutogluonModels/ag-20230123_035239/models/WeightedEnsemble_L2/',
'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20230123_035239/models/LightGBMXT_BAG_L2/',
'LightGBM_BAG_L2': 'AutogluonModels/ag-20230123_035239/models/LightGBM_BAG_L2/',
'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20230123_035239/models/RandomForestMSE_BAG_L2/',
'CatBoost_BAG_L2': 'AutogluonModels/ag-20230123_035239/models/CatBoost_BAG_L2/',
'ExtraTreesMSE_BAG_L2': 'AutogluonModels/ag-20230123_035239/models/ExtraTreesMSE_BAG_L2/',
'WeightedEnsemble_L3': 'AutogluonModels/ag-20230123_035239/models/WeightedEnsemble_L3/'},
'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.03249764442443848,
'KNeighborsDist_BAG_L1': 0.02977895736694336,
'LightGBMXT_BAG_L1': 60.74278664588928,
'LightGBM_BAG_L1': 26.093218088150024,
'RandomForestMSE_BAG_L1': 10.327482461929321,
'CatBoost_BAG_L1': 196.88709592819214,
'ExtraTreesMSE_BAG_L1': 4.782058477401733,
'NeuralNetFastAI_BAG_L1': 80.19454550743103,
'WeightedEnsemble_L2': 0.4473297595977783,
'LightGBMXT_BAG_L2': 53.514878034591675,
'LightGBM_BAG_L2': 21.476577758789062,
'RandomForestMSE_BAG_L2': 27.262657165527344,
'CatBoost_BAG_L2': 69.75167155265808,
'ExtraTreesMSE_BAG_L2': 7.926140785217285,
'WeightedEnsemble_L3': 0.31131768226623535},
'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.039820194244384766,
'KNeighborsDist_BAG_L1': 0.0386197566986084,
'LightGBMXT_BAG_L1': 6.493901491165161,
'LightGBM_BAG_L1': 1.39211106300354,
'RandomForestMSE_BAG_L1': 0.5878767967224121,
'CatBoost_BAG_L1': 0.12408566474914551,
'ExtraTreesMSE_BAG_L1': 0.6821451187133789,
'NeuralNetFastAI_BAG_L1': 0.3330366611480713,
'WeightedEnsemble_L2': 0.0006747245788574219,
'LightGBMXT_BAG_L2': 3.999812602996826,
'LightGBM_BAG_L2': 0.21611976623535156,
'RandomForestMSE_BAG_L2': 0.6061689853668213,
'CatBoost_BAG_L2': 0.05341005325317383,
'ExtraTreesMSE_BAG_L2': 0.6060116291046143,
'WeightedEnsemble_L3': 0.0007116794586181641},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'KNeighborsDist_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'LightGBMXT_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L3 -52.885174 11.174019 505.817829
1 RandomForestMSE_BAG_L2 -53.393270 10.297766 406.352121
2 ExtraTreesMSE_BAG_L2 -54.021078 10.297608 387.015604
3 LightGBM_BAG_L2 -55.101032 9.907717 400.566041
4 CatBoost_BAG_L2 -55.705478 9.745007 448.841135
5 LightGBMXT_BAG_L2 -60.705655 13.691409 432.604342
6 KNeighborsDist_BAG_L1 -84.125061 0.038620 0.029779
7 WeightedEnsemble_L2 -84.125061 0.039294 0.477109
8 KNeighborsUnif_BAG_L1 -101.546199 0.039820 0.032498
9 RandomForestMSE_BAG_L1 -116.544294 0.587877 10.327482
10 ExtraTreesMSE_BAG_L1 -124.588053 0.682145 4.782058
11 CatBoost_BAG_L1 -130.485847 0.124086 196.887096
12 LightGBM_BAG_L1 -131.054162 1.392111 26.093218
13 LightGBMXT_BAG_L1 -131.460909 6.493901 60.742787
14 NeuralNetFastAI_BAG_L1 -136.539545 0.333037 80.194546
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.000712 0.311318 3 True
1 0.606169 27.262657 2 True
2 0.606012 7.926141 2 True
3 0.216120 21.476578 2 True
4 0.053410 69.751672 2 True
5 3.999813 53.514878 2 True
6 0.038620 0.029779 1 True
7 0.000675 0.447330 2 True
8 0.039820 0.032498 1 True
9 0.587877 10.327482 1 True
10 0.682145 4.782058 1 True
11 0.124086 196.887096 1 True
12 1.392111 26.093218 1 True
13 6.493901 60.742787 1 True
14 0.333037 80.194546 1 True
fit_order
0 15
1 12
2 14
3 11
4 13
5 10
6 2
7 9
8 1
9 5
10 7
11 6
12 4
13 3
14 8 }
predictions = predictor.predict(test)
predictions.head()
0 23.318344 1 42.508015 2 45.909454 3 48.781364 4 51.674591 Name: count, dtype: float32
# Describe the `predictions` series to see if there are any negative values
predictions.describe()
count 6493.000000 mean 100.555389 std 90.140991 min 3.055882 25% 20.927584 50% 62.675346 75% 168.302856 max 364.284882 Name: count, dtype: float64
# How many negative values do we have?
(predictions < 0).sum()
0
# Set them to zero
predictions[predictions < 0] = 0
predictions.describe()
count 6493.000000 mean 100.555389 std 90.140991 min 3.055882 25% 20.927584 50% 62.675346 75% 168.302856 max 364.284882 Name: count, dtype: float64
submission["count"] = predictions
submission.to_csv("submission.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission.csv -m "first raw submission"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 357kB/s] Successfully submitted to Bike Sharing Demand
My Submissions¶!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName date description status publicScore privateScore -------------- ------------------- -------------------- -------- ----------- ------------ submission.csv 2023-01-23 04:09:07 first raw submission complete 1.80760 1.80760
1.80760¶# Create a histogram of all features to show the distribution of each one relative to the data. This is part of the exploritory data analysis
train.hist(figsize=(15,15))
array([[<AxesSubplot:title={'center':'datetime'}>,
<AxesSubplot:title={'center':'season'}>,
<AxesSubplot:title={'center':'holiday'}>],
[<AxesSubplot:title={'center':'workingday'}>,
<AxesSubplot:title={'center':'weather'}>,
<AxesSubplot:title={'center':'temp'}>],
[<AxesSubplot:title={'center':'atemp'}>,
<AxesSubplot:title={'center':'humidity'}>,
<AxesSubplot:title={'center':'windspeed'}>],
[<AxesSubplot:title={'center':'count'}>, <AxesSubplot:>,
<AxesSubplot:>]], dtype=object)
# create a new feature from datetime object
train['year'] = train.datetime.dt.year
train['month'] = train.datetime.dt.month
train['day'] = train.datetime.dt.day
train['hour'] = train.datetime.dt.hour
train['weekday'] = train.datetime.dt.weekday
test['year'] = test.datetime.dt.year
test['month'] = test.datetime.dt.month
test['day'] = test.datetime.dt.day
test['hour'] = test.datetime.dt.hour
test['weekday'] = test.datetime.dt.weekday
train = train.drop('datetime', axis = 1)
test = test.drop('datetime', axis = 1)
train.info()
test.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10886 entries, 0 to 10885 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 season 10886 non-null int64 1 holiday 10886 non-null int64 2 workingday 10886 non-null int64 3 weather 10886 non-null int64 4 temp 10886 non-null float64 5 atemp 10886 non-null float64 6 humidity 10886 non-null int64 7 windspeed 10886 non-null float64 8 count 10886 non-null int64 9 hour 10886 non-null int64 10 year 10886 non-null int64 11 month 10886 non-null int64 12 day 10886 non-null int64 13 weekday 10886 non-null int64 dtypes: float64(3), int64(11) memory usage: 1.2 MB <class 'pandas.core.frame.DataFrame'> RangeIndex: 6493 entries, 0 to 6492 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 season 6493 non-null int64 1 holiday 6493 non-null int64 2 workingday 6493 non-null int64 3 weather 6493 non-null int64 4 temp 6493 non-null float64 5 atemp 6493 non-null float64 6 humidity 6493 non-null int64 7 windspeed 6493 non-null float64 8 hour 6493 non-null int64 9 year 6493 non-null int64 10 month 6493 non-null int64 11 day 6493 non-null int64 12 weekday 6493 non-null int64 dtypes: float64(3), int64(10) memory usage: 659.6 KB
train.corr()
| season | holiday | workingday | weather | temp | atemp | humidity | windspeed | count | hour | year | month | day | weekday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| season | 1.000000 | 0.029368 | -0.008126 | 0.008879 | 0.258689 | 0.264744 | 0.190610 | -0.147121 | 0.163439 | -0.006546 | -0.004797 | 0.971524 | 0.001729 | -0.010553 |
| holiday | 0.029368 | 1.000000 | -0.250491 | -0.007074 | 0.000295 | -0.005215 | 0.001929 | 0.008409 | -0.005393 | -0.000354 | 0.012021 | 0.001731 | -0.015877 | -0.191832 |
| workingday | -0.008126 | -0.250491 | 1.000000 | 0.033772 | 0.029966 | 0.024660 | -0.010880 | 0.013373 | 0.011594 | 0.002780 | -0.002482 | -0.003394 | 0.009829 | -0.704267 |
| weather | 0.008879 | -0.007074 | 0.033772 | 1.000000 | -0.055035 | -0.055376 | 0.406244 | 0.007261 | -0.128655 | -0.022740 | -0.012548 | 0.012144 | -0.007890 | -0.047692 |
| temp | 0.258689 | 0.000295 | 0.029966 | -0.055035 | 1.000000 | 0.984948 | -0.064949 | -0.017852 | 0.394454 | 0.145430 | 0.061226 | 0.257589 | 0.015551 | -0.038466 |
| atemp | 0.264744 | -0.005215 | 0.024660 | -0.055376 | 0.984948 | 1.000000 | -0.043536 | -0.057473 | 0.389784 | 0.140343 | 0.058540 | 0.264173 | 0.011866 | -0.040235 |
| humidity | 0.190610 | 0.001929 | -0.010880 | 0.406244 | -0.064949 | -0.043536 | 1.000000 | -0.318607 | -0.317371 | -0.278011 | -0.078606 | 0.204537 | -0.011335 | -0.026507 |
| windspeed | -0.147121 | 0.008409 | 0.013373 | 0.007261 | -0.017852 | -0.057473 | -0.318607 | 1.000000 | 0.101369 | 0.146631 | -0.015221 | -0.150192 | 0.036157 | -0.024804 |
| count | 0.163439 | -0.005393 | 0.011594 | -0.128655 | 0.394454 | 0.389784 | -0.317371 | 0.101369 | 1.000000 | 0.400601 | 0.260403 | 0.166862 | 0.019826 | -0.002283 |
| hour | -0.006546 | -0.000354 | 0.002780 | -0.022740 | 0.145430 | 0.140343 | -0.278011 | 0.146631 | 0.400601 | 1.000000 | -0.004234 | -0.006818 | 0.001132 | -0.002925 |
| year | -0.004797 | 0.012021 | -0.002482 | -0.012548 | 0.061226 | 0.058540 | -0.078606 | -0.015221 | 0.260403 | -0.004234 | 1.000000 | -0.004932 | 0.001800 | -0.003785 |
| month | 0.971524 | 0.001731 | -0.003394 | 0.012144 | 0.257589 | 0.264173 | 0.204537 | -0.150192 | 0.166862 | -0.006818 | -0.004932 | 1.000000 | 0.001974 | -0.002266 |
| day | 0.001729 | -0.015877 | 0.009829 | -0.007890 | 0.015551 | 0.011866 | -0.011335 | 0.036157 | 0.019826 | 0.001132 | 0.001800 | 0.001974 | 1.000000 | -0.011070 |
| weekday | -0.010553 | -0.191832 | -0.704267 | -0.047692 | -0.038466 | -0.040235 | -0.026507 | -0.024804 | -0.002283 | -0.002925 | -0.003785 | -0.002266 | -0.011070 | 1.000000 |
import seaborn as sns
sns.clustermap(train.corr())
<seaborn.matrix.ClusterGrid at 0x7f9591f7cd00>
sns.pairplot(train)
<seaborn.axisgrid.PairGrid at 0x7f9591f7c5b0>
import matplotlib.pyplot as plt
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 8))
train.plot(ax=axes[0, 0], x="year", y="count", kind="scatter")
train.plot(ax=axes[0, 1], x="month", y="count", kind="scatter")
train.plot(ax=axes[1, 0], x="day", y="count", kind="scatter")
train.plot(ax=axes[1, 1], x="hour", y="count", kind="scatter")
train.plot(ax=axes[1, 1], x="weekday", y="count", kind="scatter")
<AxesSubplot:xlabel='weekday', ylabel='count'>
train["season"] = train.season.astype('category')
train["weather"] = train.weather.astype('category')
train["holiday"] = train.holiday.astype('category')
train["workingday"] = train.workingday.astype('category')
test["season"] = test.season.astype('category')
test["weather"] = test.weather.astype('category')
test["holiday"] = test.holiday.astype('category')
test["workingday"] = test.workingday.astype('category')
# View are new feature
train.head()
| season | holiday | workingday | weather | temp | atemp | humidity | windspeed | count | hour | year | month | day | weekday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 16 | 0 | 2011 | 1 | 1 | 5 |
| 1 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 40 | 1 | 2011 | 1 | 1 | 5 |
| 2 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 32 | 2 | 2011 | 1 | 1 | 5 |
| 3 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 13 | 3 | 2011 | 1 | 1 | 5 |
| 4 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 1 | 4 | 2011 | 1 | 1 | 5 |
# View histogram of all features again now with the hour feature
train.hist(figsize=(15,15))
array([[<AxesSubplot:title={'center':'temp'}>,
<AxesSubplot:title={'center':'atemp'}>,
<AxesSubplot:title={'center':'humidity'}>],
[<AxesSubplot:title={'center':'windspeed'}>,
<AxesSubplot:title={'center':'count'}>,
<AxesSubplot:title={'center':'hour'}>],
[<AxesSubplot:title={'center':'year'}>,
<AxesSubplot:title={'center':'month'}>,
<AxesSubplot:title={'center':'day'}>],
[<AxesSubplot:title={'center':'weekday'}>, <AxesSubplot:>,
<AxesSubplot:>]], dtype=object)
## Histogram - Hours Feature
ax = train['hour'].hist()
ax.set_xlabel('hour')
ax.set_ylabel('# samples')
ax.set_title('Histogram - Hours Feature')
fig = ax.get_figure()
fig.savefig('histogram_hours_feature.png')
# verify columns and create train with new features
train_new_features = train[train.columns.to_list()]
train_new_features.head()
| season | holiday | workingday | weather | temp | atemp | humidity | windspeed | count | hour | year | month | day | weekday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 16 | 0 | 2011 | 1 | 1 | 5 |
| 1 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 40 | 1 | 2011 | 1 | 1 | 5 |
| 2 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 32 | 2 | 2011 | 1 | 1 | 5 |
| 3 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 13 | 3 | 2011 | 1 | 1 | 5 |
| 4 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 1 | 4 | 2011 | 1 | 1 | 5 |
predictor_new_features = TabularPredictor(label="count").fit(train_data = train_new_features, time_limit=600, presets=['best_quality'])
No path specified. Models will be saved in: "AutogluonModels/ag-20230124_040306/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20230124_040306/"
AutoGluon Version: 0.6.2
Python Version: 3.8.10
Operating System: Linux
Platform Machine: x86_64
Platform Version: #1 SMP Fri Dec 9 09:57:03 UTC 2022
Train Data Rows: 10886
Train Data Columns: 13
Label Column: count
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
Label info (max, min, mean, stddev): (977, 1, 191.57413, 181.14445)
If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 2470.59 MB
Train Data (Original) Memory Usage: 0.83 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('category', []) : 4 | ['season', 'holiday', 'workingday', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 6 | ['humidity', 'hour', 'year', 'month', 'day', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 5 | ['humidity', 'hour', 'month', 'day', 'weekday']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
0.1s = Fit runtime
13 features in original data used to generate 13 features in processed data.
Train Data (Processed) Memory Usage: 0.75 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.12s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.82s of the 599.87s of remaining time.
-119.9788 = Validation score (-root_mean_squared_error)
0.03s = Training runtime
0.16s = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 396.22s of the 596.27s of remaining time.
-115.0385 = Validation score (-root_mean_squared_error)
0.03s = Training runtime
0.19s = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 395.89s of the 595.94s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-34.7488 = Validation score (-root_mean_squared_error)
81.6s = Training runtime
12.89s = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 306.22s of the 506.27s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-34.4395 = Validation score (-root_mean_squared_error)
41.29s = Training runtime
4.74s = Validation runtime
Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 260.7s of the 460.76s of remaining time.
-38.9875 = Validation score (-root_mean_squared_error)
9.32s = Training runtime
0.55s = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 248.46s of the 448.52s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-34.8825 = Validation score (-root_mean_squared_error)
209.71s = Training runtime
0.18s = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 35.55s of the 235.61s of remaining time.
-38.9384 = Validation score (-root_mean_squared_error)
5.02s = Training runtime
0.55s = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 27.57s of the 227.62s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-88.6188 = Validation score (-root_mean_squared_error)
42.78s = Training runtime
0.39s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 180.21s of remaining time.
-32.7532 = Validation score (-root_mean_squared_error)
0.61s = Training runtime
0.0s = Validation runtime
Fitting 9 L2 models ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 179.53s of the 179.51s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-33.5966 = Validation score (-root_mean_squared_error)
20.63s = Training runtime
0.34s = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 155.34s of the 155.33s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-32.8836 = Validation score (-root_mean_squared_error)
19.25s = Training runtime
0.13s = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 132.52s of the 132.5s of remaining time.
-33.1773 = Validation score (-root_mean_squared_error)
26.99s = Training runtime
0.61s = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 101.89s of the 101.87s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-32.8664 = Validation score (-root_mean_squared_error)
58.81s = Training runtime
0.11s = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 40.01s of the 39.99s of remaining time.
-32.5551 = Validation score (-root_mean_squared_error)
7.97s = Training runtime
0.6s = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L2 ... Training model for up to 29.05s of the 29.03s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-33.3914 = Validation score (-root_mean_squared_error)
44.37s = Training runtime
0.54s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -18.66s of remaining time.
-32.3883 = Validation score (-root_mean_squared_error)
0.35s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 619.19s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20230124_040306/")
predictor_new_features.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L3 -32.388291 21.638019 547.511710 0.000679 0.350135 3 True 16
1 ExtraTreesMSE_BAG_L2 -32.555143 20.238662 397.744583 0.601037 7.967542 2 True 14
2 WeightedEnsemble_L2 -32.753155 18.897724 347.555751 0.001064 0.611749 2 True 9
3 CatBoost_BAG_L2 -32.866351 19.748722 448.583476 0.111097 58.806435 2 True 13
4 LightGBM_BAG_L2 -32.883595 19.767843 409.027248 0.130219 19.250206 2 True 11
5 RandomForestMSE_BAG_L2 -33.177270 20.250873 416.766898 0.613249 26.989857 2 True 12
6 NeuralNetFastAI_BAG_L2 -33.391432 20.181738 434.147535 0.544113 44.370494 2 True 15
7 LightGBMXT_BAG_L2 -33.596635 19.975402 410.410778 0.337778 20.633736 2 True 10
8 LightGBM_BAG_L1 -34.439520 4.735687 41.291645 4.735687 41.291645 1 True 4
9 LightGBMXT_BAG_L1 -34.748807 12.885391 81.596144 12.885391 81.596144 1 True 3
10 CatBoost_BAG_L1 -34.882533 0.179237 209.710797 0.179237 209.710797 1 True 6
11 ExtraTreesMSE_BAG_L1 -38.938443 0.549690 5.022911 0.549690 5.022911 1 True 7
12 RandomForestMSE_BAG_L1 -38.987462 0.546655 9.322506 0.546655 9.322506 1 True 5
13 NeuralNetFastAI_BAG_L1 -88.618850 0.385614 42.777728 0.385614 42.777728 1 True 8
14 KNeighborsDist_BAG_L1 -115.038459 0.191050 0.027272 0.191050 0.027272 1 True 2
15 KNeighborsUnif_BAG_L1 -119.978810 0.164299 0.028038 0.164299 0.028038 1 True 1
Number of models trained: 16
Types of models trained:
{'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_RF', 'WeightedEnsembleModel', 'StackerEnsembleModel_XT'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 5 | ['humidity', 'hour', 'month', 'day', 'weekday']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
Plot summary of models saved to file: AutogluonModels/ag-20230124_040306/SummaryOfModels.html
*** End of fit() summary ***
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
'NeuralNetFastAI_BAG_L2': 'StackerEnsembleModel_NNFastAiTabular',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'KNeighborsUnif_BAG_L1': -119.97880966975461,
'KNeighborsDist_BAG_L1': -115.038459148802,
'LightGBMXT_BAG_L1': -34.748807067972244,
'LightGBM_BAG_L1': -34.43952035710387,
'RandomForestMSE_BAG_L1': -38.987461831485355,
'CatBoost_BAG_L1': -34.88253330930523,
'ExtraTreesMSE_BAG_L1': -38.9384425957686,
'NeuralNetFastAI_BAG_L1': -88.61884952457058,
'WeightedEnsemble_L2': -32.75315538627129,
'LightGBMXT_BAG_L2': -33.59663520640235,
'LightGBM_BAG_L2': -32.88359450763695,
'RandomForestMSE_BAG_L2': -33.17726957645052,
'CatBoost_BAG_L2': -32.86635106455899,
'ExtraTreesMSE_BAG_L2': -32.555142712792,
'NeuralNetFastAI_BAG_L2': -33.39143201082359,
'WeightedEnsemble_L3': -32.38829124232758},
'model_best': 'WeightedEnsemble_L3',
'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/KNeighborsUnif_BAG_L1/',
'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/KNeighborsDist_BAG_L1/',
'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/LightGBMXT_BAG_L1/',
'LightGBM_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/LightGBM_BAG_L1/',
'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/RandomForestMSE_BAG_L1/',
'CatBoost_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/CatBoost_BAG_L1/',
'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/ExtraTreesMSE_BAG_L1/',
'NeuralNetFastAI_BAG_L1': 'AutogluonModels/ag-20230124_040306/models/NeuralNetFastAI_BAG_L1/',
'WeightedEnsemble_L2': 'AutogluonModels/ag-20230124_040306/models/WeightedEnsemble_L2/',
'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/LightGBMXT_BAG_L2/',
'LightGBM_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/LightGBM_BAG_L2/',
'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/RandomForestMSE_BAG_L2/',
'CatBoost_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/CatBoost_BAG_L2/',
'ExtraTreesMSE_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/ExtraTreesMSE_BAG_L2/',
'NeuralNetFastAI_BAG_L2': 'AutogluonModels/ag-20230124_040306/models/NeuralNetFastAI_BAG_L2/',
'WeightedEnsemble_L3': 'AutogluonModels/ag-20230124_040306/models/WeightedEnsemble_L3/'},
'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.02803778648376465,
'KNeighborsDist_BAG_L1': 0.027272462844848633,
'LightGBMXT_BAG_L1': 81.59614372253418,
'LightGBM_BAG_L1': 41.29164505004883,
'RandomForestMSE_BAG_L1': 9.322505950927734,
'CatBoost_BAG_L1': 209.7107973098755,
'ExtraTreesMSE_BAG_L1': 5.0229105949401855,
'NeuralNetFastAI_BAG_L1': 42.77772831916809,
'WeightedEnsemble_L2': 0.6117486953735352,
'LightGBMXT_BAG_L2': 20.63373637199402,
'LightGBM_BAG_L2': 19.250206470489502,
'RandomForestMSE_BAG_L2': 26.989856958389282,
'CatBoost_BAG_L2': 58.806434631347656,
'ExtraTreesMSE_BAG_L2': 7.967541933059692,
'NeuralNetFastAI_BAG_L2': 44.37049412727356,
'WeightedEnsemble_L3': 0.35013484954833984},
'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.16429948806762695,
'KNeighborsDist_BAG_L1': 0.19105029106140137,
'LightGBMXT_BAG_L1': 12.885390520095825,
'LightGBM_BAG_L1': 4.735687017440796,
'RandomForestMSE_BAG_L1': 0.5466549396514893,
'CatBoost_BAG_L1': 0.17923736572265625,
'ExtraTreesMSE_BAG_L1': 0.5496904850006104,
'NeuralNetFastAI_BAG_L1': 0.38561439514160156,
'WeightedEnsemble_L2': 0.0010638236999511719,
'LightGBMXT_BAG_L2': 0.33777761459350586,
'LightGBM_BAG_L2': 0.1302187442779541,
'RandomForestMSE_BAG_L2': 0.6132485866546631,
'CatBoost_BAG_L2': 0.11109709739685059,
'ExtraTreesMSE_BAG_L2': 0.6010372638702393,
'NeuralNetFastAI_BAG_L2': 0.5441131591796875,
'WeightedEnsemble_L3': 0.0006792545318603516},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'KNeighborsDist_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'LightGBMXT_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'NeuralNetFastAI_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L3 -32.388291 21.638019 547.511710
1 ExtraTreesMSE_BAG_L2 -32.555143 20.238662 397.744583
2 WeightedEnsemble_L2 -32.753155 18.897724 347.555751
3 CatBoost_BAG_L2 -32.866351 19.748722 448.583476
4 LightGBM_BAG_L2 -32.883595 19.767843 409.027248
5 RandomForestMSE_BAG_L2 -33.177270 20.250873 416.766898
6 NeuralNetFastAI_BAG_L2 -33.391432 20.181738 434.147535
7 LightGBMXT_BAG_L2 -33.596635 19.975402 410.410778
8 LightGBM_BAG_L1 -34.439520 4.735687 41.291645
9 LightGBMXT_BAG_L1 -34.748807 12.885391 81.596144
10 CatBoost_BAG_L1 -34.882533 0.179237 209.710797
11 ExtraTreesMSE_BAG_L1 -38.938443 0.549690 5.022911
12 RandomForestMSE_BAG_L1 -38.987462 0.546655 9.322506
13 NeuralNetFastAI_BAG_L1 -88.618850 0.385614 42.777728
14 KNeighborsDist_BAG_L1 -115.038459 0.191050 0.027272
15 KNeighborsUnif_BAG_L1 -119.978810 0.164299 0.028038
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.000679 0.350135 3 True
1 0.601037 7.967542 2 True
2 0.001064 0.611749 2 True
3 0.111097 58.806435 2 True
4 0.130219 19.250206 2 True
5 0.613249 26.989857 2 True
6 0.544113 44.370494 2 True
7 0.337778 20.633736 2 True
8 4.735687 41.291645 1 True
9 12.885391 81.596144 1 True
10 0.179237 209.710797 1 True
11 0.549690 5.022911 1 True
12 0.546655 9.322506 1 True
13 0.385614 42.777728 1 True
14 0.191050 0.027272 1 True
15 0.164299 0.028038 1 True
fit_order
0 16
1 14
2 9
3 13
4 11
5 12
6 15
7 10
8 4
9 3
10 6
11 7
12 5
13 8
14 2
15 1 }
predictor_new_features.leaderboard(silent=True).plot(kind="bar", x="model", y="score_val")
<AxesSubplot:xlabel='model'>
# Remember to set all negative values to zero
predictions_new_features = predictor_new_features.predict(test)
print((predictions_new_features < 0).sum())
predictions_new_features[predictions_new_features<0] = 0
predictions_new_features.describe()
0
count 6493.000000 mean 189.825287 std 173.529587 min 2.279106 25% 46.497772 50% 147.084320 75% 277.898193 max 905.759033 Name: count, dtype: float64
test["count"] = 0
performance_new_features_2 = predictor_new_features.evaluate(test)
print("The performance indicators are : \n", performance_new_features_2)
/usr/local/lib/python3.8/dist-packages/scipy/stats/stats.py:4023: PearsonRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
warnings.warn(PearsonRConstantInputWarning())
Evaluation: root_mean_squared_error on test data: -257.1799339333484
Note: Scores are always higher_is_better. This metric score can be multiplied by -1 to get the metric value.
Evaluations on test data:
{
"root_mean_squared_error": -257.1799339333484,
"mean_squared_error": -66141.51841796147,
"mean_absolute_error": -189.82528704993257,
"r2": 0.0,
"pearsonr": NaN,
"median_absolute_error": -147.08432006835938
}
The performance indicators are :
{'root_mean_squared_error': -257.1799339333484, 'mean_squared_error': -66141.51841796147, 'mean_absolute_error': -189.82528704993257, 'r2': 0.0, 'pearsonr': nan, 'median_absolute_error': -147.08432006835938}
# Same submitting predictions
submission_new_features = pd.read_csv('submission.csv')
submission_new_features["count"] = predictions_new_features
submission_new_features.to_csv("submission_new_features_2.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission_new_features.csv -m "new features 2"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 310kB/s] Successfully submitted to Bike Sharing Demand
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName date description status publicScore privateScore --------------------------- ------------------- --------------------------------- -------- ----------- ------------ submission_new_features.csv 2023-01-24 04:22:35 new features 2 complete 0.69366 0.69366 submission_new_hpo.csv 2023-01-23 15:19:10 new features with hyperparameters complete 1.31738 1.31738 submission_new_features.csv 2023-01-23 05:13:22 new features complete 0.69366 0.69366 submission.csv 2023-01-23 04:09:07 first raw submission complete 1.80760 1.80760
0.69366¶hyperparameter and hyperparameter_tune_kwargs arguments.train_new_hpo = train[train.columns.to_list()]
train_new_hpo.head()
| season | holiday | workingday | weather | temp | atemp | humidity | windspeed | count | hour | year | month | day | weekday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 16 | 0 | 2011 | 1 | 1 | 5 |
| 1 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 40 | 1 | 2011 | 1 | 1 | 5 |
| 2 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 32 | 2 | 2011 | 1 | 1 | 5 |
| 3 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 13 | 3 | 2011 | 1 | 1 | 5 |
| 4 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 1 | 4 | 2011 | 1 | 1 | 5 |
#https://auto.gluon.ai/stable/tutorials/tabular_prediction/tabular-indepth.html#specifying-hyperparameters-and-tuning-them
import autogluon.core as ag
nn_options = { # specifies non-default hyperparameter values for neural network models
'num_epochs': 10, # number of training epochs
'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True), # learning rate used in training (real-valued hyperparameter searched on log-scale)
'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'), # activation function used in NN (categorical hyperparameter, default = first entry)
'layers': ag.space.Categorical([100], [1000], [200, 100], [300, 200, 100]), # each choice for categorical hyperparameter 'layers' corresponds to list of sizes for each NN layer to use
'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1), # dropout probability (real-valued hyperparameter)
}
gbm_options = { # specifies non-default hyperparameter values for lightGBM gradient boosted trees
'num_boost_round': 100, # number of boosting rounds (controls training time of GBM models)
'num_leaves': ag.space.Int(lower=26, upper=66, default=36) # number of leaves in trees (integer hyperparameter)
}
hyperparameters = {
# hyperparameters of each model type
'GBM': gbm_options,
'NN': nn_options }
search_strategy = 'auto'
hyperparameter_tune_kwargs = {
# HPO is not performed unless hyperparameter_tune_kwargs is specified
'scheduler' : 'local', # local scheduler
'searcher': search_strategy
}
predictor_new_hpo = TabularPredictor(label="count").fit(train_data=train_new_hpo, time_limit=600, presets="best_quality", hyperparameters=hyperparameters, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs)
predictor_new_hpo.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L3 -132.758746 0.002508 237.809628 0.001135 1.007823 3 True 16
1 LightGBM_BAG_L2/T1 -132.910994 0.001118 196.132830 0.000139 19.016073 2 True 11
2 LightGBM_BAG_L2/T2 -132.971214 0.001108 197.085495 0.000129 19.968738 2 True 12
3 WeightedEnsemble_L2 -133.074144 0.001216 38.909853 0.000999 0.441841 2 True 10
4 LightGBM_BAG_L2/T5 -133.191017 0.001094 197.683682 0.000115 20.566925 2 True 15
5 LightGBM_BAG_L1/T8 -133.366596 0.000085 19.524052 0.000085 19.524052 1 True 8
6 LightGBM_BAG_L2/T3 -133.454683 0.001105 197.816994 0.000126 20.700237 2 True 13
7 LightGBM_BAG_L1/T7 -134.190162 0.000132 18.943960 0.000132 18.943960 1 True 7
8 LightGBM_BAG_L1/T3 -134.194130 0.000125 19.437752 0.000125 19.437752 1 True 3
9 LightGBM_BAG_L1/T2 -135.029528 0.000127 18.838973 0.000127 18.838973 1 True 2
10 LightGBM_BAG_L1/T1 -135.473207 0.000152 23.195276 0.000152 23.195276 1 True 1
11 LightGBM_BAG_L1/T5 -135.746640 0.000085 19.472352 0.000085 19.472352 1 True 5
12 LightGBM_BAG_L2/T4 -148.765464 0.001064 196.921249 0.000085 19.804492 2 True 14
13 LightGBM_BAG_L1/T9 -152.443903 0.000086 19.152268 0.000086 19.152268 1 True 9
14 LightGBM_BAG_L1/T6 -153.737220 0.000082 19.267452 0.000082 19.267452 1 True 6
15 LightGBM_BAG_L1/T4 -156.019224 0.000105 19.284673 0.000105 19.284673 1 True 4
Number of models trained: 16
Types of models trained:
{'WeightedEnsembleModel', 'StackerEnsembleModel_LGB'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 3 | ['season', 'weather', 'humidity']
('int', ['bool']) : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20230123_150537/SummaryOfModels.html
*** End of fit() summary ***
{'model_types': {'LightGBM_BAG_L1/T1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T3': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T4': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T5': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T6': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T7': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T8': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T9': 'StackerEnsembleModel_LGB',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBM_BAG_L2/T1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2/T2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2/T3': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2/T4': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2/T5': 'StackerEnsembleModel_LGB',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'LightGBM_BAG_L1/T1': -135.4732072756916,
'LightGBM_BAG_L1/T2': -135.02952795945737,
'LightGBM_BAG_L1/T3': -134.19413006667938,
'LightGBM_BAG_L1/T4': -156.01922351304077,
'LightGBM_BAG_L1/T5': -135.74663980949632,
'LightGBM_BAG_L1/T6': -153.73722037655872,
'LightGBM_BAG_L1/T7': -134.19016227420124,
'LightGBM_BAG_L1/T8': -133.36659565430654,
'LightGBM_BAG_L1/T9': -152.443903346809,
'WeightedEnsemble_L2': -133.07414427738493,
'LightGBM_BAG_L2/T1': -132.91099439270775,
'LightGBM_BAG_L2/T2': -132.97121381309128,
'LightGBM_BAG_L2/T3': -133.45468287620238,
'LightGBM_BAG_L2/T4': -148.76546366752737,
'LightGBM_BAG_L2/T5': -133.19101680216852,
'WeightedEnsemble_L3': -132.75874630547804},
'model_best': 'WeightedEnsemble_L3',
'model_paths': {'LightGBM_BAG_L1/T1': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T1/',
'LightGBM_BAG_L1/T2': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T2/',
'LightGBM_BAG_L1/T3': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T3/',
'LightGBM_BAG_L1/T4': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T4/',
'LightGBM_BAG_L1/T5': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T5/',
'LightGBM_BAG_L1/T6': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T6/',
'LightGBM_BAG_L1/T7': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T7/',
'LightGBM_BAG_L1/T8': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T8/',
'LightGBM_BAG_L1/T9': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L1/T9/',
'WeightedEnsemble_L2': 'AutogluonModels/ag-20230123_150537/models/WeightedEnsemble_L2/',
'LightGBM_BAG_L2/T1': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L2/T1/',
'LightGBM_BAG_L2/T2': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L2/T2/',
'LightGBM_BAG_L2/T3': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L2/T3/',
'LightGBM_BAG_L2/T4': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L2/T4/',
'LightGBM_BAG_L2/T5': '/root/AutogluonModels/ag-20230123_150537/models/LightGBM_BAG_L2/T5/',
'WeightedEnsemble_L3': 'AutogluonModels/ag-20230123_150537/models/WeightedEnsemble_L3/'},
'model_fit_times': {'LightGBM_BAG_L1/T1': 23.195276021957397,
'LightGBM_BAG_L1/T2': 18.8389732837677,
'LightGBM_BAG_L1/T3': 19.437751531600952,
'LightGBM_BAG_L1/T4': 19.28467321395874,
'LightGBM_BAG_L1/T5': 19.472351551055908,
'LightGBM_BAG_L1/T6': 19.267452001571655,
'LightGBM_BAG_L1/T7': 18.943959951400757,
'LightGBM_BAG_L1/T8': 19.524051666259766,
'LightGBM_BAG_L1/T9': 19.152267694473267,
'WeightedEnsemble_L2': 0.44184088706970215,
'LightGBM_BAG_L2/T1': 19.01607322692871,
'LightGBM_BAG_L2/T2': 19.968738079071045,
'LightGBM_BAG_L2/T3': 20.700237035751343,
'LightGBM_BAG_L2/T4': 19.804491758346558,
'LightGBM_BAG_L2/T5': 20.566925287246704,
'WeightedEnsemble_L3': 1.0078227519989014},
'model_pred_times': {'LightGBM_BAG_L1/T1': 0.00015163421630859375,
'LightGBM_BAG_L1/T2': 0.0001266002655029297,
'LightGBM_BAG_L1/T3': 0.0001251697540283203,
'LightGBM_BAG_L1/T4': 0.00010514259338378906,
'LightGBM_BAG_L1/T5': 8.463859558105469e-05,
'LightGBM_BAG_L1/T6': 8.249282836914062e-05,
'LightGBM_BAG_L1/T7': 0.0001323223114013672,
'LightGBM_BAG_L1/T8': 8.487701416015625e-05,
'LightGBM_BAG_L1/T9': 8.58306884765625e-05,
'WeightedEnsemble_L2': 0.0009992122650146484,
'LightGBM_BAG_L2/T1': 0.00013899803161621094,
'LightGBM_BAG_L2/T2': 0.0001289844512939453,
'LightGBM_BAG_L2/T3': 0.00012636184692382812,
'LightGBM_BAG_L2/T4': 8.535385131835938e-05,
'LightGBM_BAG_L2/T5': 0.00011491775512695312,
'WeightedEnsemble_L3': 0.0011348724365234375},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'LightGBM_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T3': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T4': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T5': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T6': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T7': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T8': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T9': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2/T2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2/T3': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2/T4': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2/T5': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L3 -132.758746 0.002508 237.809628
1 LightGBM_BAG_L2/T1 -132.910994 0.001118 196.132830
2 LightGBM_BAG_L2/T2 -132.971214 0.001108 197.085495
3 WeightedEnsemble_L2 -133.074144 0.001216 38.909853
4 LightGBM_BAG_L2/T5 -133.191017 0.001094 197.683682
5 LightGBM_BAG_L1/T8 -133.366596 0.000085 19.524052
6 LightGBM_BAG_L2/T3 -133.454683 0.001105 197.816994
7 LightGBM_BAG_L1/T7 -134.190162 0.000132 18.943960
8 LightGBM_BAG_L1/T3 -134.194130 0.000125 19.437752
9 LightGBM_BAG_L1/T2 -135.029528 0.000127 18.838973
10 LightGBM_BAG_L1/T1 -135.473207 0.000152 23.195276
11 LightGBM_BAG_L1/T5 -135.746640 0.000085 19.472352
12 LightGBM_BAG_L2/T4 -148.765464 0.001064 196.921249
13 LightGBM_BAG_L1/T9 -152.443903 0.000086 19.152268
14 LightGBM_BAG_L1/T6 -153.737220 0.000082 19.267452
15 LightGBM_BAG_L1/T4 -156.019224 0.000105 19.284673
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.001135 1.007823 3 True
1 0.000139 19.016073 2 True
2 0.000129 19.968738 2 True
3 0.000999 0.441841 2 True
4 0.000115 20.566925 2 True
5 0.000085 19.524052 1 True
6 0.000126 20.700237 2 True
7 0.000132 18.943960 1 True
8 0.000125 19.437752 1 True
9 0.000127 18.838973 1 True
10 0.000152 23.195276 1 True
11 0.000085 19.472352 1 True
12 0.000085 19.804492 2 True
13 0.000086 19.152268 1 True
14 0.000082 19.267452 1 True
15 0.000105 19.284673 1 True
fit_order
0 16
1 11
2 12
3 10
4 15
5 8
6 13
7 7
8 3
9 2
10 1
11 5
12 14
13 9
14 6
15 4 }
# Remember to set all negative values to zero
predictions_new_hpo = predictor_new_hpo.predict(test)
print((predictions_new_hpo < 0).sum())
predictions_new_hpo[predictions_new_hpo<0] = 0
predictions_new_hpo.describe()
0
count 6493.000000 mean 195.552979 std 118.013786 min 44.590271 25% 108.154114 50% 166.215988 75% 265.650452 max 586.378479 Name: count, dtype: float64
# Same submitting predictions
submission_new_hpo = pd.read_csv('submission.csv')
submission_new_hpo["count"] = predictions_new_hpo
submission_new_hpo.to_csv("submission_new_hpo.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo.csv -m "new features with hyperparameters"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 303kB/s] Successfully submitted to Bike Sharing Demand
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName date description status publicScore privateScore --------------------------- ------------------- --------------------------------- -------- ----------- ------------ submission_new_hpo.csv 2023-01-23 15:19:10 new features with hyperparameters complete 1.31738 1.31738 submission_new_features.csv 2023-01-23 05:13:22 new features complete 0.69366 0.69366 submission.csv 2023-01-23 04:09:07 first raw submission complete 1.80760 1.80760
1.3173¶# second attempt of hypertuning using default
hyperparameters = 'default'
hyperparameter_tune_kwargs = 'auto'
predictor_new_hpo_2 = TabularPredictor(label="count").fit(train_data=train_new_hpo, time_limit=600, presets="best_quality", hyperparameters=hyperparameters, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs)
No model was trained during hyperparameter tuning NeuralNetTorch_BAG_L2... Skipping this model.
Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 28.54s of the 71.13s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
-33.5697 = Validation score (-root_mean_squared_error)
32.43s = Training runtime
0.22s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the 35.29s of remaining time.
-32.4438 = Validation score (-root_mean_squared_error)
0.52s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 565.42s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20230124_044035/")
predictor_new_hpo_2.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L3 -32.443774 2.589879 365.024152 0.000785 0.522403 3 True 18
1 WeightedEnsemble_L2 -32.653702 2.588854 199.551741 0.000812 0.460111 2 True 10
2 ExtraTreesMSE_BAG_L2 -32.681000 2.588577 248.454099 0.000148 11.326929 2 True 15
3 LightGBM_BAG_L2/T1 -32.869428 2.588549 259.950361 0.000120 22.823192 2 True 12
4 RandomForestMSE_BAG_L2 -32.944686 2.588613 269.409279 0.000184 32.282109 2 True 13
5 CatBoost_BAG_L2/T1 -33.020736 2.588523 271.446168 0.000094 34.318999 2 True 14
6 XGBoost_BAG_L2/T1 -33.070961 2.588547 263.750520 0.000119 26.623350 2 True 16
7 LightGBMLarge_BAG_L2 -33.569674 2.804723 269.557356 0.216294 32.430187 2 True 17
8 LightGBMXT_BAG_L2/T1 -33.737242 2.588577 260.944040 0.000149 23.816871 2 True 11
9 LightGBMLarge_BAG_L1 -34.085048 2.587255 41.989262 2.587255 41.989262 1 True 9
10 LightGBM_BAG_L1/T1 -34.439520 0.000121 46.736726 0.000121 46.736726 1 True 4
11 LightGBMXT_BAG_L1/T1 -34.929700 0.000129 55.958177 0.000129 55.958177 1 True 3
12 XGBoost_BAG_L1/T1 -35.840374 0.000089 33.858625 0.000089 33.858625 1 True 8
13 ExtraTreesMSE_BAG_L1 -38.938443 0.000170 8.325169 0.000170 8.325169 1 True 7
14 RandomForestMSE_BAG_L1 -38.987462 0.000279 12.223671 0.000279 12.223671 1 True 5
15 CatBoost_BAG_L1/T1 -40.977680 0.000129 37.359621 0.000129 37.359621 1 True 6
16 KNeighborsDist_BAG_L1 -115.038459 0.000122 0.316092 0.000122 0.316092 1 True 2
17 KNeighborsUnif_BAG_L1 -119.978810 0.000135 0.359826 0.000135 0.359826 1 True 1
Number of models trained: 18
Types of models trained:
{'StackerEnsembleModel_KNN', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_XGBoost', 'WeightedEnsembleModel', 'StackerEnsembleModel_XT'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 5 | ['humidity', 'hour', 'month', 'day', 'weekday']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
Plot summary of models saved to file: AutogluonModels/ag-20230124_044035/SummaryOfModels.html
*** End of fit() summary ***
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
'LightGBMXT_BAG_L1/T1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1/T1': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L1/T1': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
'XGBoost_BAG_L1/T1': 'StackerEnsembleModel_XGBoost',
'LightGBMLarge_BAG_L1': 'StackerEnsembleModel_LGB',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L2/T1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2/T1': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L2/T1': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
'XGBoost_BAG_L2/T1': 'StackerEnsembleModel_XGBoost',
'LightGBMLarge_BAG_L2': 'StackerEnsembleModel_LGB',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'KNeighborsUnif_BAG_L1': -119.97880966975461,
'KNeighborsDist_BAG_L1': -115.038459148802,
'LightGBMXT_BAG_L1/T1': -34.9297003401567,
'LightGBM_BAG_L1/T1': -34.43952035710387,
'RandomForestMSE_BAG_L1': -38.987461831485355,
'CatBoost_BAG_L1/T1': -40.97768035938449,
'ExtraTreesMSE_BAG_L1': -38.9384425957686,
'XGBoost_BAG_L1/T1': -35.840374362701326,
'LightGBMLarge_BAG_L1': -34.085047986742616,
'WeightedEnsemble_L2': -32.65370241912103,
'LightGBMXT_BAG_L2/T1': -33.73724153874836,
'LightGBM_BAG_L2/T1': -32.869428476139554,
'RandomForestMSE_BAG_L2': -32.944685579678655,
'CatBoost_BAG_L2/T1': -33.02073563789231,
'ExtraTreesMSE_BAG_L2': -32.68099996467634,
'XGBoost_BAG_L2/T1': -33.07096072504919,
'LightGBMLarge_BAG_L2': -33.56967448130482,
'WeightedEnsemble_L3': -32.4437742724192},
'model_best': 'WeightedEnsemble_L3',
'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20230124_044035/models/KNeighborsUnif_BAG_L1/',
'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20230124_044035/models/KNeighborsDist_BAG_L1/',
'LightGBMXT_BAG_L1/T1': '/root/AutogluonModels/ag-20230124_044035/models/LightGBMXT_BAG_L1/T1/',
'LightGBM_BAG_L1/T1': '/root/AutogluonModels/ag-20230124_044035/models/LightGBM_BAG_L1/T1/',
'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20230124_044035/models/RandomForestMSE_BAG_L1/',
'CatBoost_BAG_L1/T1': '/root/AutogluonModels/ag-20230124_044035/models/CatBoost_BAG_L1/T1/',
'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20230124_044035/models/ExtraTreesMSE_BAG_L1/',
'XGBoost_BAG_L1/T1': '/root/AutogluonModels/ag-20230124_044035/models/XGBoost_BAG_L1/T1/',
'LightGBMLarge_BAG_L1': 'AutogluonModels/ag-20230124_044035/models/LightGBMLarge_BAG_L1/',
'WeightedEnsemble_L2': 'AutogluonModels/ag-20230124_044035/models/WeightedEnsemble_L2/',
'LightGBMXT_BAG_L2/T1': '/root/AutogluonModels/ag-20230124_044035/models/LightGBMXT_BAG_L2/T1/',
'LightGBM_BAG_L2/T1': '/root/AutogluonModels/ag-20230124_044035/models/LightGBM_BAG_L2/T1/',
'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20230124_044035/models/RandomForestMSE_BAG_L2/',
'CatBoost_BAG_L2/T1': '/root/AutogluonModels/ag-20230124_044035/models/CatBoost_BAG_L2/T1/',
'ExtraTreesMSE_BAG_L2': 'AutogluonModels/ag-20230124_044035/models/ExtraTreesMSE_BAG_L2/',
'XGBoost_BAG_L2/T1': '/root/AutogluonModels/ag-20230124_044035/models/XGBoost_BAG_L2/T1/',
'LightGBMLarge_BAG_L2': 'AutogluonModels/ag-20230124_044035/models/LightGBMLarge_BAG_L2/',
'WeightedEnsemble_L3': 'AutogluonModels/ag-20230124_044035/models/WeightedEnsemble_L3/'},
'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.35982608795166016,
'KNeighborsDist_BAG_L1': 0.31609177589416504,
'LightGBMXT_BAG_L1/T1': 55.95817732810974,
'LightGBM_BAG_L1/T1': 46.73672556877136,
'RandomForestMSE_BAG_L1': 12.223671197891235,
'CatBoost_BAG_L1/T1': 37.35962128639221,
'ExtraTreesMSE_BAG_L1': 8.325169086456299,
'XGBoost_BAG_L1/T1': 33.858625173568726,
'LightGBMLarge_BAG_L1': 41.989262104034424,
'WeightedEnsemble_L2': 0.4601109027862549,
'LightGBMXT_BAG_L2/T1': 23.81687068939209,
'LightGBM_BAG_L2/T1': 22.82319164276123,
'RandomForestMSE_BAG_L2': 32.28210926055908,
'CatBoost_BAG_L2/T1': 34.31899881362915,
'ExtraTreesMSE_BAG_L2': 11.326929092407227,
'XGBoost_BAG_L2/T1': 26.623350381851196,
'LightGBMLarge_BAG_L2': 32.43018651008606,
'WeightedEnsemble_L3': 0.5224027633666992},
'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.0001347064971923828,
'KNeighborsDist_BAG_L1': 0.0001220703125,
'LightGBMXT_BAG_L1/T1': 0.0001285076141357422,
'LightGBM_BAG_L1/T1': 0.00012111663818359375,
'RandomForestMSE_BAG_L1': 0.00027871131896972656,
'CatBoost_BAG_L1/T1': 0.00012946128845214844,
'ExtraTreesMSE_BAG_L1': 0.0001697540283203125,
'XGBoost_BAG_L1/T1': 8.893013000488281e-05,
'LightGBMLarge_BAG_L1': 2.5872554779052734,
'WeightedEnsemble_L2': 0.0008118152618408203,
'LightGBMXT_BAG_L2/T1': 0.00014853477478027344,
'LightGBM_BAG_L2/T1': 0.00011992454528808594,
'RandomForestMSE_BAG_L2': 0.0001838207244873047,
'CatBoost_BAG_L2/T1': 9.441375732421875e-05,
'ExtraTreesMSE_BAG_L2': 0.00014829635620117188,
'XGBoost_BAG_L2/T1': 0.00011873245239257812,
'LightGBMLarge_BAG_L2': 0.2162938117980957,
'WeightedEnsemble_L3': 0.0007851123809814453},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'KNeighborsDist_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'LightGBMXT_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'XGBoost_BAG_L1/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMLarge_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'XGBoost_BAG_L2/T1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMLarge_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L3 -32.443774 2.589879 365.024152
1 WeightedEnsemble_L2 -32.653702 2.588854 199.551741
2 ExtraTreesMSE_BAG_L2 -32.681000 2.588577 248.454099
3 LightGBM_BAG_L2/T1 -32.869428 2.588549 259.950361
4 RandomForestMSE_BAG_L2 -32.944686 2.588613 269.409279
5 CatBoost_BAG_L2/T1 -33.020736 2.588523 271.446168
6 XGBoost_BAG_L2/T1 -33.070961 2.588547 263.750520
7 LightGBMLarge_BAG_L2 -33.569674 2.804723 269.557356
8 LightGBMXT_BAG_L2/T1 -33.737242 2.588577 260.944040
9 LightGBMLarge_BAG_L1 -34.085048 2.587255 41.989262
10 LightGBM_BAG_L1/T1 -34.439520 0.000121 46.736726
11 LightGBMXT_BAG_L1/T1 -34.929700 0.000129 55.958177
12 XGBoost_BAG_L1/T1 -35.840374 0.000089 33.858625
13 ExtraTreesMSE_BAG_L1 -38.938443 0.000170 8.325169
14 RandomForestMSE_BAG_L1 -38.987462 0.000279 12.223671
15 CatBoost_BAG_L1/T1 -40.977680 0.000129 37.359621
16 KNeighborsDist_BAG_L1 -115.038459 0.000122 0.316092
17 KNeighborsUnif_BAG_L1 -119.978810 0.000135 0.359826
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.000785 0.522403 3 True
1 0.000812 0.460111 2 True
2 0.000148 11.326929 2 True
3 0.000120 22.823192 2 True
4 0.000184 32.282109 2 True
5 0.000094 34.318999 2 True
6 0.000119 26.623350 2 True
7 0.216294 32.430187 2 True
8 0.000149 23.816871 2 True
9 2.587255 41.989262 1 True
10 0.000121 46.736726 1 True
11 0.000129 55.958177 1 True
12 0.000089 33.858625 1 True
13 0.000170 8.325169 1 True
14 0.000279 12.223671 1 True
15 0.000129 37.359621 1 True
16 0.000122 0.316092 1 True
17 0.000135 0.359826 1 True
fit_order
0 18
1 10
2 15
3 12
4 13
5 14
6 16
7 17
8 11
9 9
10 4
11 3
12 8
13 7
14 5
15 6
16 2
17 1 }
predictor_new_hpo_2.leaderboard(silent=True).plot(kind="bar", x="model", y="score_val")
<AxesSubplot:xlabel='model'>
test["count"] = 0
performance_new_hpo_2 = predictor_new_hpo_2.evaluate(test)
print("The performance indicators are : \n", performance_new_hpo_2)
/usr/local/lib/python3.8/dist-packages/scipy/stats/stats.py:4023: PearsonRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
warnings.warn(PearsonRConstantInputWarning())
Evaluation: root_mean_squared_error on test data: -257.28843951366616
Note: Scores are always higher_is_better. This metric score can be multiplied by -1 to get the metric value.
Evaluations on test data:
{
"root_mean_squared_error": -257.28843951366616,
"mean_squared_error": -66197.34110737746,
"mean_absolute_error": -189.76286823456155,
"r2": 0.0,
"pearsonr": NaN,
"median_absolute_error": -147.41567993164062
}
The performance indicators are :
{'root_mean_squared_error': -257.28843951366616, 'mean_squared_error': -66197.34110737746, 'mean_absolute_error': -189.76286823456155, 'r2': 0.0, 'pearsonr': nan, 'median_absolute_error': -147.41567993164062}
# Remember to set all negative values to zero
predictions_new_hpo_2 = predictor_new_hpo_2.predict(test)
print((predictions_new_hpo_2 < 0).sum())
predictions_new_hpo_2[predictions_new_hpo_2<0] = 0
predictions_new_hpo_2.describe()
0
count 6493.000000 mean 189.762863 std 173.758575 min 3.497172 25% 45.154335 50% 147.415680 75% 278.535583 max 904.307739 Name: count, dtype: float64
# Same submitting predictions
submission_new_hpo_2 = pd.read_csv('submission.csv')
submission_new_hpo_2["count"] = predictions_new_hpo_2
submission_new_hpo_2.to_csv("submission_new_hpo_2.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo_2.csv -m "new features with hyperparameters 2"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 341kB/s] Successfully submitted to Bike Sharing Demand
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName date description status publicScore privateScore --------------------------- ------------------- ----------------------------------- -------- ----------- ------------ submission_new_hpo_2.csv 2023-01-24 04:58:04 new features with hyperparameters 2 complete 0.44585 0.44585 submission_new_features.csv 2023-01-24 04:22:35 new features 2 complete 0.69366 0.69366 submission_new_hpo.csv 2023-01-23 15:19:10 new features with hyperparameters complete 1.31738 1.31738 submission_new_features.csv 2023-01-23 05:13:22 new features complete 0.69366 0.69366
# Taking the top model score from each training run and creating a line plot to show improvement
# You can create these in the notebook and save them to PNG or use some other tool (e.g. google sheets, excel)
fig = pd.DataFrame(
{
"model": ["initial", "add_features", "hpo", "add features 2", "hpo 2"],
"score": [-52.885174, -30.024358, -132.758746, -32.3882,-32.443774]
}
).plot(x="model", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_train_score_2.png')
# Take the 3 kaggle scores and creating a line plot to show improvement
fig = pd.DataFrame(
{
"test_eval": ["initial", "add_features", "hpo","add features 2", "hpo 2"],
"score": [1.80760,0.69366,1.31738,0.69366,0.4458]
}
).plot(x="test_eval", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_test_score_2.png')
# The 3 hyperparameters we tuned with the kaggle score as the result
pd.DataFrame({
"model": ["initial", "add_features", "hpo","add features 2", "hpo 2"],
"time_limit": [600, 600, 600, 600, 600],
"presets": ["best_quality", "best_quality", "best_quality", "best_quality", "best_quality"],
"hyperparameters": ['default','default', "{'GBM: "+ str(gbm_options)+"}, {NN: "+ str(nn_options)+"}", 'default','default'],
"hyperparameter_tune_kwargs":["-", "-", "auto","-", "{'searcher':'auto'}"],
"score": [1.80760,0.69366,1.31738, 0.69366, 0.4458]
})
| model | time_limit | presets | hyperparameters | hyperparameter_tune_kwargs | score | |
|---|---|---|---|---|---|---|
| 0 | initial | 600 | best_quality | default | - | 1.80760 |
| 1 | add_features | 600 | best_quality | default | - | 0.69366 |
| 2 | hpo | 600 | best_quality | {'GBM: {'num_boost_round': 100, 'num_leaves': Int: lower=26, upper=66}}, {NN: {'num_epochs': 10, 'learning_rate': Real: lower=0.0001, upper=0.01, 'activation': Categorical['relu', 'softrelu', 'tanh'], 'layers': Categorical[[100], [1000], [200, 100], [300, 200, 100]], 'dropout_prob': Real: lower=0.0, upper=0.5}} | auto | 1.31738 |
| 3 | add features 2 | 600 | best_quality | default | - | 0.69366 |
| 4 | hpo 2 | 600 | best_quality | default | {'searcher':'auto'} | 0.44580 |
!tar --version
tar (GNU tar) 1.30 Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by John Gilmore and Jay Fenlason.
ls
AutogluonModels/ submission.csv bike-sharing-demand.zip submission_new_features.csv cd0385-project-starter/ submission_new_features_2.csv histogram_hours_feature.png submission_new_hpo.csv model_test_score.png submission_new_hpo_2.csv model_test_score_2.png test.csv model_train_score.png train.csv model_train_score_2.png training_runs.png sampleSubmission.csv
!tar --exclude='AutogluonModels' --exclude='./.*' -zcvf backup.tar.gz .
./ ./sampleSubmission.csv ./train.csv ./histogram_hours_feature.png ./model_train_score.png ./backup.tar.gz ./test.csv ./training_runs.png ./submission.csv ./submission_new_hpo_2.csv ./model_test_score.png ./submission_new_features_2.csv ./submission_new_hpo.csv ./model_train_score_2.png ./submission_new_features.csv ./cd0385-project-starter/ ./cd0385-project-starter/.git/ ./cd0385-project-starter/.git/refs/ ./cd0385-project-starter/.git/refs/heads/ ./cd0385-project-starter/.git/refs/heads/main ./cd0385-project-starter/.git/refs/remotes/ ./cd0385-project-starter/.git/refs/remotes/origin/ ./cd0385-project-starter/.git/refs/remotes/origin/HEAD ./cd0385-project-starter/.git/refs/tags/ ./cd0385-project-starter/.git/index ./cd0385-project-starter/.git/hooks/ ./cd0385-project-starter/.git/hooks/pre-push.sample ./cd0385-project-starter/.git/hooks/pre-merge-commit.sample ./cd0385-project-starter/.git/hooks/pre-applypatch.sample ./cd0385-project-starter/.git/hooks/applypatch-msg.sample ./cd0385-project-starter/.git/hooks/post-update.sample ./cd0385-project-starter/.git/hooks/pre-rebase.sample ./cd0385-project-starter/.git/hooks/fsmonitor-watchman.sample ./cd0385-project-starter/.git/hooks/prepare-commit-msg.sample ./cd0385-project-starter/.git/hooks/commit-msg.sample ./cd0385-project-starter/.git/hooks/pre-commit.sample ./cd0385-project-starter/.git/hooks/push-to-checkout.sample ./cd0385-project-starter/.git/hooks/update.sample ./cd0385-project-starter/.git/hooks/pre-receive.sample ./cd0385-project-starter/.git/description ./cd0385-project-starter/.git/info/ ./cd0385-project-starter/.git/info/exclude ./cd0385-project-starter/.git/HEAD ./cd0385-project-starter/.git/objects/ ./cd0385-project-starter/.git/objects/info/ ./cd0385-project-starter/.git/objects/pack/ ./cd0385-project-starter/.git/objects/pack/pack-eadf976caa534391b6423e05e4c7e0705fcccd87.idx ./cd0385-project-starter/.git/objects/pack/pack-eadf976caa534391b6423e05e4c7e0705fcccd87.pack ./cd0385-project-starter/.git/config ./cd0385-project-starter/.git/branches/ ./cd0385-project-starter/.git/logs/ ./cd0385-project-starter/.git/logs/refs/ ./cd0385-project-starter/.git/logs/refs/heads/ ./cd0385-project-starter/.git/logs/refs/heads/main ./cd0385-project-starter/.git/logs/refs/remotes/ ./cd0385-project-starter/.git/logs/refs/remotes/origin/ ./cd0385-project-starter/.git/logs/refs/remotes/origin/HEAD ./cd0385-project-starter/.git/logs/HEAD ./cd0385-project-starter/.git/packed-refs ./cd0385-project-starter/.ipynb_checkpoints/ ./cd0385-project-starter/.ipynb_checkpoints/README-checkpoint.md ./cd0385-project-starter/.github/ ./cd0385-project-starter/.github/workflows/ ./cd0385-project-starter/.github/workflows/manual.yml ./cd0385-project-starter/CODEOWNERS ./cd0385-project-starter/README.md ./cd0385-project-starter/project/ ./cd0385-project-starter/project/project-template.ipynb ./cd0385-project-starter/project/report-template.md ./cd0385-project-starter/project/.ipynb_checkpoints/ ./cd0385-project-starter/project/.ipynb_checkpoints/README-checkpoint.md ./cd0385-project-starter/project/.ipynb_checkpoints/report-template-checkpoint.md ./cd0385-project-starter/project/.ipynb_checkpoints/project-template-checkpoint.ipynb ./cd0385-project-starter/project/img/ ./cd0385-project-starter/project/img/model_train_score.png ./cd0385-project-starter/project/img/sagemaker-studio-git1.png ./cd0385-project-starter/project/img/model_test_score.png ./cd0385-project-starter/project/img/sagemaker-studio-git2.png ./cd0385-project-starter/project/README.md ./bike-sharing-demand.zip ./model_test_score_2.png tar: .: file changed as we read it
!tar --help
Usage: tar [OPTION...] [FILE]...
GNU 'tar' saves many files together into a single tape or disk archive, and can
restore individual files from the archive.
Examples:
tar -cf archive.tar foo bar # Create archive.tar from files foo and bar.
tar -tvf archive.tar # List all files in archive.tar verbosely.
tar -xf archive.tar # Extract all files from archive.tar.
Local file name selection:
--add-file=FILE add given FILE to the archive (useful if its name
starts with a dash)
-C, --directory=DIR change to directory DIR
--exclude=PATTERN exclude files, given as a PATTERN
--exclude-backups exclude backup and lock files
--exclude-caches exclude contents of directories containing
CACHEDIR.TAG, except for the tag file itself
--exclude-caches-all exclude directories containing CACHEDIR.TAG
--exclude-caches-under exclude everything under directories containing
CACHEDIR.TAG
--exclude-ignore=FILE read exclude patterns for each directory from
FILE, if it exists
--exclude-ignore-recursive=FILE
read exclude patterns for each directory and its
subdirectories from FILE, if it exists
--exclude-tag=FILE exclude contents of directories containing FILE,
except for FILE itself
--exclude-tag-all=FILE exclude directories containing FILE
--exclude-tag-under=FILE exclude everything under directories
containing FILE
--exclude-vcs exclude version control system directories
--exclude-vcs-ignores read exclude patterns from the VCS ignore files
--no-null disable the effect of the previous --null option
--no-recursion avoid descending automatically in directories
--no-unquote do not unquote input file or member names
--no-verbatim-files-from -T treats file names starting with dash as
options (default)
--null -T reads null-terminated names; implies
--verbatim-files-from
--recursion recurse into directories (default)
-T, --files-from=FILE get names to extract or create from FILE
--unquote unquote input file or member names (default)
--verbatim-files-from -T reads file names verbatim (no escape or option
handling)
-X, --exclude-from=FILE exclude patterns listed in FILE
File name matching options (affect both exclude and include patterns):
--anchored patterns match file name start
--ignore-case ignore case
--no-anchored patterns match after any '/' (default for
exclusion)
--no-ignore-case case sensitive matching (default)
--no-wildcards verbatim string matching
--no-wildcards-match-slash wildcards do not match '/'
--wildcards use wildcards (default for exclusion)
--wildcards-match-slash wildcards match '/' (default for exclusion)
Main operation mode:
-A, --catenate, --concatenate append tar files to an archive
-c, --create create a new archive
-d, --diff, --compare find differences between archive and file system
--delete delete from the archive (not on mag tapes!)
-r, --append append files to the end of an archive
-t, --list list the contents of an archive
--test-label test the archive volume label and exit
-u, --update only append files newer than copy in archive
-x, --extract, --get extract files from an archive
Operation modifiers:
--check-device check device numbers when creating incremental
archives (default)
-g, --listed-incremental=FILE handle new GNU-format incremental backup
-G, --incremental handle old GNU-format incremental backup
--hole-detection=TYPE technique to detect holes
--ignore-failed-read do not exit with nonzero on unreadable files
--level=NUMBER dump level for created listed-incremental archive
-n, --seek archive is seekable
--no-check-device do not check device numbers when creating
incremental archives
--no-seek archive is not seekable
--occurrence[=NUMBER] process only the NUMBERth occurrence of each file
in the archive; this option is valid only in
conjunction with one of the subcommands --delete,
--diff, --extract or --list and when a list of
files is given either on the command line or via
the -T option; NUMBER defaults to 1
--sparse-version=MAJOR[.MINOR]
set version of the sparse format to use (implies
--sparse)
-S, --sparse handle sparse files efficiently
Overwrite control:
-k, --keep-old-files don't replace existing files when extracting,
treat them as errors
--keep-directory-symlink preserve existing symlinks to directories when
extracting
--keep-newer-files don't replace existing files that are newer than
their archive copies
--no-overwrite-dir preserve metadata of existing directories
--one-top-level[=DIR] create a subdirectory to avoid having loose files
extracted
--overwrite overwrite existing files when extracting
--overwrite-dir overwrite metadata of existing directories when
extracting (default)
--recursive-unlink empty hierarchies prior to extracting directory
--remove-files remove files after adding them to the archive
--skip-old-files don't replace existing files when extracting,
silently skip over them
-U, --unlink-first remove each file prior to extracting over it
-W, --verify attempt to verify the archive after writing it
Select output stream:
--ignore-command-error ignore exit codes of children
--no-ignore-command-error treat non-zero exit codes of children as
error
-O, --to-stdout extract files to standard output
--to-command=COMMAND pipe extracted files to another program
Handling of file attributes:
--atime-preserve[=METHOD] preserve access times on dumped files, either
by restoring the times after reading
(METHOD='replace'; default) or by not setting the
times in the first place (METHOD='system')
--clamp-mtime only set time when the file is more recent than
what was given with --mtime
--delay-directory-restore delay setting modification times and
permissions of extracted directories until the end
of extraction
--group=NAME force NAME as group for added files
--group-map=FILE use FILE to map file owner GIDs and names
--mode=CHANGES force (symbolic) mode CHANGES for added files
--mtime=DATE-OR-FILE set mtime for added files from DATE-OR-FILE
-m, --touch don't extract file modified time
--no-delay-directory-restore
cancel the effect of --delay-directory-restore
option
--no-same-owner extract files as yourself (default for ordinary
users)
--no-same-permissions apply the user's umask when extracting permissions
from the archive (default for ordinary users)
--numeric-owner always use numbers for user/group names
--owner=NAME force NAME as owner for added files
--owner-map=FILE use FILE to map file owner UIDs and names
-p, --preserve-permissions, --same-permissions
extract information about file permissions
(default for superuser)
--same-owner try extracting files with the same ownership as
exists in the archive (default for superuser)
-s, --preserve-order, --same-order
member arguments are listed in the same order as
the files in the archive
--sort=ORDER directory sorting order: none (default), name or
inode
Handling of extended file attributes:
--acls Enable the POSIX ACLs support
--no-acls Disable the POSIX ACLs support
--no-selinux Disable the SELinux context support
--no-xattrs Disable extended attributes support
--selinux Enable the SELinux context support
--xattrs Enable extended attributes support
--xattrs-exclude=MASK specify the exclude pattern for xattr keys
--xattrs-include=MASK specify the include pattern for xattr keys
Device selection and switching:
-f, --file=ARCHIVE use archive file or device ARCHIVE
--force-local archive file is local even if it has a colon
-F, --info-script=NAME, --new-volume-script=NAME
run script at end of each tape (implies -M)
-L, --tape-length=NUMBER change tape after writing NUMBER x 1024 bytes
-M, --multi-volume create/list/extract multi-volume archive
--rmt-command=COMMAND use given rmt COMMAND instead of rmt
--rsh-command=COMMAND use remote COMMAND instead of rsh
--volno-file=FILE use/update the volume number in FILE
Device blocking:
-b, --blocking-factor=BLOCKS BLOCKS x 512 bytes per record
-B, --read-full-records reblock as we read (for 4.2BSD pipes)
-i, --ignore-zeros ignore zeroed blocks in archive (means EOF)
--record-size=NUMBER NUMBER of bytes per record, multiple of 512
Archive format selection:
-H, --format=FORMAT create archive of the given format
FORMAT is one of the following:
gnu GNU tar 1.13.x format
oldgnu GNU format as per tar <= 1.12
pax POSIX 1003.1-2001 (pax) format
posix same as pax
ustar POSIX 1003.1-1988 (ustar) format
v7 old V7 tar format
--old-archive, --portability
same as --format=v7
--pax-option=keyword[[:]=value][,keyword[[:]=value]]...
control pax keywords
--posix same as --format=posix
-V, --label=TEXT create archive with volume name TEXT; at
list/extract time, use TEXT as a globbing pattern
for volume name
Compression options:
-a, --auto-compress use archive suffix to determine the compression
program
-I, --use-compress-program=PROG
filter through PROG (must accept -d)
-j, --bzip2 filter the archive through bzip2
-J, --xz filter the archive through xz
--lzip filter the archive through lzip
--lzma filter the archive through xz
--lzop filter the archive through lzop
--no-auto-compress do not use archive suffix to determine the
compression program
-z, --gzip, --gunzip, --ungzip filter the archive through gzip
--zstd filter the archive through zstd
-Z, --compress, --uncompress filter the archive through compress
Local file selection:
--backup[=CONTROL] backup before removal, choose version CONTROL
-h, --dereference follow symlinks; archive and dump the files they
point to
--hard-dereference follow hard links; archive and dump the files they
refer to
-K, --starting-file=MEMBER-NAME
begin at member MEMBER-NAME when reading the
archive
--newer-mtime=DATE compare date and time when data changed only
-N, --newer=DATE-OR-FILE, --after-date=DATE-OR-FILE
only store files newer than DATE-OR-FILE
--one-file-system stay in local file system when creating archive
-P, --absolute-names don't strip leading '/'s from file names
--suffix=STRING backup before removal, override usual suffix ('~'
unless overridden by environment variable
SIMPLE_BACKUP_SUFFIX)
File name transformations:
--strip-components=NUMBER strip NUMBER leading components from file
names on extraction
--transform=EXPRESSION, --xform=EXPRESSION
use sed replace EXPRESSION to transform file
names
Informative output:
--checkpoint[=NUMBER] display progress messages every NUMBERth record
(default 10)
--checkpoint-action=ACTION execute ACTION on each checkpoint
--full-time print file time to its full resolution
--index-file=FILE send verbose output to FILE
-l, --check-links print a message if not all links are dumped
--no-quote-chars=STRING disable quoting for characters from STRING
--quote-chars=STRING additionally quote characters from STRING
--quoting-style=STYLE set name quoting style; see below for valid STYLE
values
-R, --block-number show block number within archive with each message
--show-defaults show tar defaults
--show-omitted-dirs when listing or extracting, list each directory
that does not match search criteria
--show-snapshot-field-ranges
show valid ranges for snapshot-file fields
--show-transformed-names, --show-stored-names
show file or archive names after transformation
--totals[=SIGNAL] print total bytes after processing the archive;
with an argument - print total bytes when this
SIGNAL is delivered; Allowed signals are: SIGHUP,
SIGQUIT, SIGINT, SIGUSR1 and SIGUSR2; the names
without SIG prefix are also accepted
--utc print file modification times in UTC
-v, --verbose verbosely list files processed
--warning=KEYWORD warning control
-w, --interactive, --confirmation
ask for confirmation for every action
Compatibility options:
-o when creating, same as --old-archive; when
extracting, same as --no-same-owner
Other options:
-?, --help give this help list
--restrict disable use of some potentially harmful options
--usage give a short usage message
--version print program version
Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.
The backup suffix is '~', unless set with --suffix or SIMPLE_BACKUP_SUFFIX.
The version control may be set with --backup or VERSION_CONTROL, values are:
none, off never make backups
t, numbered make numbered backups
nil, existing numbered if numbered backups exist, simple otherwise
never, simple always make simple backups
Valid arguments for the --quoting-style option are:
literal
shell
shell-always
shell-escape
shell-escape-always
c
c-maybe
escape
locale
clocale
*This* tar defaults to:
--format=gnu -f- -b20 --quoting-style=escape --rmt-command=/usr/sbin/rmt
--rsh-command=/usr/bin/rsh